Access integer array via doc without sorting?

PhracturedBlue · February 16, 2020, 3:40am

I was trying to do some scripting on the new 'histogram' datatype that comes with 7.6, but histograms don't support scripting.
I thought I might be able to use arrays instead, but have not had any luck because arrays are sorted when accessing the _doc object. I know I could use _source, but that will be painful in the future when I have a large number of documents.

what I have:

{
    "_source" : {
      "data": {
         "values" : [5.0, 5.5, 6.0, 7.0,10.0, 20.0, 30.0],
         "counts" : [0, 60, 0, 0, 0, 0, 0]
      }
    },
    "fields" : {
      "scripted" : [0, 0, 0, 0, 0, 0, 60]
    }
  },

with a query like:

    "script_fields": {
      "scripted": {
        "script": {
          "source": "doc['data.counts']" 
        }
      }

I thought I could possibly get around this by altering the mapping to have 'index: false, doc_values: true', but it had no effect on the query.

Is there any work-around, or should I just store my data as an object instead of an array?

Ignacio_Vera · February 17, 2020, 4:27pm

Hi!,

The histogram field does not support scripting indeed. I am very curious what is your use case that it needs to access data counts through a script, are you able to share this need?

PhracturedBlue · February 17, 2020, 4:43pm

I store pre-aggregated data in elasticsearch. I want to generate heatmaps fom this data, but Kibana doesn't support heatmaps from pre-aggregated data, so instead I generate these using Grafana.
In the past I've done this be defining queries per-swimlane with data like:

    data.bucket_0_5: 10,
    data.bucket_5_10: 5,
    data.bucket_10_20: 0,
    ...

And just doing a query on each field for each row in the heatmap.

I was hoping to do this with a elasticsearch histogram instead because it should be more space-efficient, as well as meaning I don't need to hand-craft each query in grafana, but there isn't really a way to extract a specific index that I can find.
I tried substituting my key/value pairs with arrays, but that doesn't work due to array sorting.
Instead I've currently chosen a prometheus-esque storage solution:

    data.values: [5, 10, 20, ...]
    data.counts: [10, 15, 15, ...]

Where the counts are cumulative.
I can then use scripting to extract data by index:

    return doc['data.counts'][i] - doc['data.counts'][i-1]

I can also generate a linearized 'max' approximation via:

    def c = doc['data.counts'];
    def v = doc['data.values'];
    for (i=0; i < c.length-1; i++) {
        if (c[i] ==c[i+1]) {
            return v[i]
        }
    }
    return v[-1];

This solution is actually less space-efficient than the original key/value solution, but it does make it easier to do various manipulations like the one above

system · March 16, 2020, 4:43pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Kibana5.4 scripted field access to array by index Kibana	7	2257	July 12, 2017
How to access and sort many documents using scripting fields? Kibana	4	816	January 10, 2020
Creating/Manipulating Histograms in Scripts Elasticsearch	1	379	August 20, 2020
Using doc in script but return in order Elasticsearch	3	534	December 17, 2017
Compare elements in array Kibana	5	316	August 10, 2018

Access integer array via doc without sorting?

Related topics