I was trying to do some scripting on the new 'histogram' datatype that comes with 7.6, but histograms don't support scripting.
I thought I might be able to use arrays instead, but have not had any luck because arrays are sorted when accessing the _doc object. I know I could use _source, but that will be painful in the future when I have a large number of documents.
what I have:
{
"_source" : {
"data": {
"values" : [5.0, 5.5, 6.0, 7.0,10.0, 20.0, 30.0],
"counts" : [0, 60, 0, 0, 0, 0, 0]
}
},
"fields" : {
"scripted" : [0, 0, 0, 0, 0, 0, 60]
}
},
with a query like:
"script_fields": {
"scripted": {
"script": {
"source": "doc['data.counts']"
}
}
I thought I could possibly get around this by altering the mapping to have 'index: false, doc_values: true', but it had no effect on the query.
Is there any work-around, or should I just store my data as an object instead of an array?
Hi!,
The histogram field does not support scripting indeed. I am very curious what is your use case that it needs to access data counts through a script, are you able to share this need?
I store pre-aggregated data in elasticsearch. I want to generate heatmaps fom this data, but Kibana doesn't support heatmaps from pre-aggregated data, so instead I generate these using Grafana.
In the past I've done this be defining queries per-swimlane with data like:
data.bucket_0_5: 10,
data.bucket_5_10: 5,
data.bucket_10_20: 0,
...
And just doing a query on each field for each row in the heatmap.
I was hoping to do this with a elasticsearch histogram instead because it should be more space-efficient, as well as meaning I don't need to hand-craft each query in grafana, but there isn't really a way to extract a specific index that I can find.
I tried substituting my key/value pairs with arrays, but that doesn't work due to array sorting.
Instead I've currently chosen a prometheus-esque storage solution:
data.values: [5, 10, 20, ...]
data.counts: [10, 15, 15, ...]
Where the counts are cumulative.
I can then use scripting to extract data by index:
return doc['data.counts'][i] - doc['data.counts'][i-1]
I can also generate a linearized 'max' approximation via:
def c = doc['data.counts'];
def v = doc['data.values'];
for (i=0; i < c.length-1; i++) {
if (c[i] ==c[i+1]) {
return v[i]
}
}
return v[-1];
This solution is actually less space-efficient than the original key/value solution, but it does make it easier to do various manipulations like the one above