Access integer array via doc without sorting?

I was trying to do some scripting on the new 'histogram' datatype that comes with 7.6, but histograms don't support scripting.
I thought I might be able to use arrays instead, but have not had any luck because arrays are sorted when accessing the _doc object. I know I could use _source, but that will be painful in the future when I have a large number of documents.

what I have:

{
    "_source" : {
      "data": {
         "values" : [5.0, 5.5, 6.0, 7.0,10.0, 20.0, 30.0],
         "counts" : [0, 60, 0, 0, 0, 0, 0]
      }
    },
    "fields" : {
      "scripted" : [0, 0, 0, 0, 0, 0, 60]
    }
  },

with a query like:

    "script_fields": {
      "scripted": {
        "script": {
          "source": "doc['data.counts']" 
        }
      }

I thought I could possibly get around this by altering the mapping to have 'index: false, doc_values: true', but it had no effect on the query.

Is there any work-around, or should I just store my data as an object instead of an array?

Hi!,

The histogram field does not support scripting indeed. I am very curious what is your use case that it needs to access data counts through a script, are you able to share this need?

I store pre-aggregated data in elasticsearch. I want to generate heatmaps fom this data, but Kibana doesn't support heatmaps from pre-aggregated data, so instead I generate these using Grafana.
In the past I've done this be defining queries per-swimlane with data like:

    data.bucket_0_5: 10,
    data.bucket_5_10: 5,
    data.bucket_10_20: 0,
    ...

And just doing a query on each field for each row in the heatmap.

I was hoping to do this with a elasticsearch histogram instead because it should be more space-efficient, as well as meaning I don't need to hand-craft each query in grafana, but there isn't really a way to extract a specific index that I can find.
I tried substituting my key/value pairs with arrays, but that doesn't work due to array sorting.
Instead I've currently chosen a prometheus-esque storage solution:

    data.values: [5, 10, 20, ...]
    data.counts: [10, 15, 15, ...]

Where the counts are cumulative.
I can then use scripting to extract data by index:

    return doc['data.counts'][i] - doc['data.counts'][i-1]

I can also generate a linearized 'max' approximation via:

    def c = doc['data.counts'];
    def v = doc['data.values'];
    for (i=0; i < c.length-1; i++) {
        if (c[i] ==c[i+1]) {
            return v[i]
        }
    }
    return v[-1];

This solution is actually less space-efficient than the original key/value solution, but it does make it easier to do various manipulations like the one above

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.