Store latest metrics in "current" index for query based on latest values?

cyrilleverrier · January 25, 2017, 10:14am

Hi,

My "simple" use case turns out to be more difficult than expected.

considering many sensors that send a numeric metric (@timestamp + key/value pair)
I want to get a sorted list of the top XXX sensors based on their latest value by @timestamp

For instance:

POST mybeat/metrics/
{
  "@timestamp": "2017-01-25T08:00:00.000Z",
  "sensorID": "Sensor1",
  "my_data": 10
}

POST mybeat/metrics/
{
  "@timestamp": "2017-01-25T08:00:00.000Z",
  "sensorID": "Sensor2",
  "my_data": 20
}

POST mybeat/metrics/
{
  "@timestamp": "2017-01-25T08:10:00.000Z",
  "sensorID": "Sensor1",
  "my_data": 1
}

POST mybeat/metrics/
{
  "@timestamp": "2017-01-25T08:10:00.000Z",
  "sensorID": "Sensor2",
  "my_data": 2
}

POST mybeat/metrics/_search?filter_path=aggregations
{
  "size":0,
  "aggregations": {
    "BY_SENSOR": {
      "terms": {
        "field": "sensorID.keyword"
      },
      "aggregations": {
        "LATEST_TIMESTAMP": {
          "terms": {
            "field": "@timestamp",
            "order": {
              "_term": "desc"
            },
            "size": 1
          },
          "aggregations": {
            "LATEST_VALUE": {
              "avg": {
                "field": "my_data"
              }
            }
          }
        }
      }
    }
  }
}

The search returns the following result:

{
  "aggregations": {
    "BY_SENSOR": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "Sensor1",
          "doc_count": 2,
          "LATEST_TIMESTAMP": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 1,
            "buckets": [
              {
                "key": 1485331800000,
                "key_as_string": "2017-01-25T08:10:00.000Z",
                "doc_count": 1,
                "LATEST_VALUE": {
                  "value": 1
                }
              }
            ]
          }
        },
        {
          "key": "Sensor2",
          "doc_count": 2,
          "LATEST_TIMESTAMP": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 1,
            "buckets": [
              {
                "key": 1485331800000,
                "key_as_string": "2017-01-25T08:10:00.000Z",
                "doc_count": 1,
                "LATEST_VALUE": {
                  "value": 2
                }
              }
            ]
          }
        }
      ]
    }
  }
}

Now, I want to perform a "top_hits" aggregations on "BY_SENSOR>LATEST_TIMESTAMP>LATEST_VALUE".
But it seems that this kind of aggregation is not yet supported (see See https://github.com/elastic/elasticsearch/issues/21135)

I guess that the optimal solution for this kind of use case, which is rather common, is to have a dedicated
elastic search index that stores the latest value of the metrics with the same "sensor ID"

Any feedback on that matter?

ruflin · January 26, 2017, 12:19pm

It seems the only work around is currently doing this on the client side Good you posted it in the github issue. That should give it the most visibility.

cyrilleverrier · January 26, 2017, 12:24pm

Link to github issue: https://github.com/elastic/beats/issues/3473

ruflin · January 26, 2017, 12:50pm

Argh, sorry misunderstanding. I meant the elasticsearch github issue is a great place. Ok for you to just keep the elasticsearch one and close the one opened in beats?

cyrilleverrier · January 27, 2017, 4:36pm

@ruflin; Keeping the ES bug in github is fine.

Meanwhile, I found a way to duplicate the metrics in the "latest" index by using a ES ingest pipeline to transform the "_id" with the ID of the sensor and "_index" field with "latest_metrics"

PUT /_ingest/pipeline/transform-latest-metric HTTP/1.1
{
    "processors": [ 
         { "script":{
              "lang":"painless",
               "inline":"ctx._id = ctx.sensor_id ?: ctx._id ; ctx._index = ctx.latest_index ?: ctx._index"}
    }]
}

ruflin · January 30, 2017, 8:22am

Cool, thanks for sharing.

system · February 15, 2017, 10:15am

This topic was automatically closed after 21 days. New replies are no longer allowed.

Topic		Replies	Views
Group By (Aggregation) to get only latest fields value Elasticsearch	5	464	February 7, 2020
Get oldest / newest document in *beat Elasticsearch	8	2958	January 17, 2020
Display latest value in a „Metric“ Kibana	3	20184	February 7, 2018
Historical data vs Last Elasticsearch	4	1526	July 5, 2017
How to query for the most current timestamp Elasticsearch	7	4107	July 6, 2018

Store latest metrics in "current" index for query based on latest values?

Related topics