Store latest metrics in "current" index for query based on latest values?

Hi,

My "simple" use case turns out to be more difficult than expected.

considering many sensors that send a numeric metric (@timestamp + key/value pair)
I want to get a sorted list of the top XXX sensors based on their latest value by @timestamp

For instance:

POST mybeat/metrics/
{
  "@timestamp": "2017-01-25T08:00:00.000Z",
  "sensorID": "Sensor1",
  "my_data": 10
}

POST mybeat/metrics/
{
  "@timestamp": "2017-01-25T08:00:00.000Z",
  "sensorID": "Sensor2",
  "my_data": 20
}

POST mybeat/metrics/
{
  "@timestamp": "2017-01-25T08:10:00.000Z",
  "sensorID": "Sensor1",
  "my_data": 1
}

POST mybeat/metrics/
{
  "@timestamp": "2017-01-25T08:10:00.000Z",
  "sensorID": "Sensor2",
  "my_data": 2
}

POST mybeat/metrics/_search?filter_path=aggregations
{
  "size":0,
  "aggregations": {
    "BY_SENSOR": {
      "terms": {
        "field": "sensorID.keyword"
      },
      "aggregations": {
        "LATEST_TIMESTAMP": {
          "terms": {
            "field": "@timestamp",
            "order": {
              "_term": "desc"
            },
            "size": 1
          },
          "aggregations": {
            "LATEST_VALUE": {
              "avg": {
                "field": "my_data"
              }
            }
          }
        }
      }
    }
  }
}

The search returns the following result:

{
  "aggregations": {
    "BY_SENSOR": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "Sensor1",
          "doc_count": 2,
          "LATEST_TIMESTAMP": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 1,
            "buckets": [
              {
                "key": 1485331800000,
                "key_as_string": "2017-01-25T08:10:00.000Z",
                "doc_count": 1,
                "LATEST_VALUE": {
                  "value": 1
                }
              }
            ]
          }
        },
        {
          "key": "Sensor2",
          "doc_count": 2,
          "LATEST_TIMESTAMP": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 1,
            "buckets": [
              {
                "key": 1485331800000,
                "key_as_string": "2017-01-25T08:10:00.000Z",
                "doc_count": 1,
                "LATEST_VALUE": {
                  "value": 2
                }
              }
            ]
          }
        }
      ]
    }
  }
}

Now, I want to perform a "top_hits" aggregations on "BY_SENSOR>LATEST_TIMESTAMP>LATEST_VALUE".
But it seems that this kind of aggregation is not yet supported (see See https://github.com/elastic/elasticsearch/issues/21135)

I guess that the optimal solution for this kind of use case, which is rather common, is to have a dedicated
elastic search index that stores the latest value of the metrics with the same "sensor ID"

Any feedback on that matter?

It seems the only work around is currently doing this on the client side :frowning: Good you posted it in the github issue. That should give it the most visibility.

Link to github issue: https://github.com/elastic/beats/issues/3473

Argh, sorry misunderstanding. I meant the elasticsearch github issue is a great place. Ok for you to just keep the elasticsearch one and close the one opened in beats?

@ruflin; Keeping the ES bug in github is fine.

Meanwhile, I found a way to duplicate the metrics in the "latest" index by using a ES ingest pipeline to transform the "_id" with the ID of the sensor and "_index" field with "latest_metrics"

PUT /_ingest/pipeline/transform-latest-metric HTTP/1.1
{
    "processors": [ 
         { "script":{
              "lang":"painless",
               "inline":"ctx._id = ctx.sensor_id ?: ctx._id ; ctx._index = ctx.latest_index ?: ctx._index"}
    }]
}

Cool, thanks for sharing.

This topic was automatically closed after 21 days. New replies are no longer allowed.