Store latest metrics in "current" index for query based on latest values?


#1

Hi,

My "simple" use case turns out to be more difficult than expected.

considering many sensors that send a numeric metric (@timestamp + key/value pair)
I want to get a sorted list of the top XXX sensors based on their latest value by @timestamp

For instance:

POST mybeat/metrics/
{
  "@timestamp": "2017-01-25T08:00:00.000Z",
  "sensorID": "Sensor1",
  "my_data": 10
}

POST mybeat/metrics/
{
  "@timestamp": "2017-01-25T08:00:00.000Z",
  "sensorID": "Sensor2",
  "my_data": 20
}

POST mybeat/metrics/
{
  "@timestamp": "2017-01-25T08:10:00.000Z",
  "sensorID": "Sensor1",
  "my_data": 1
}

POST mybeat/metrics/
{
  "@timestamp": "2017-01-25T08:10:00.000Z",
  "sensorID": "Sensor2",
  "my_data": 2
}

POST mybeat/metrics/_search?filter_path=aggregations
{
  "size":0,
  "aggregations": {
    "BY_SENSOR": {
      "terms": {
        "field": "sensorID.keyword"
      },
      "aggregations": {
        "LATEST_TIMESTAMP": {
          "terms": {
            "field": "@timestamp",
            "order": {
              "_term": "desc"
            },
            "size": 1
          },
          "aggregations": {
            "LATEST_VALUE": {
              "avg": {
                "field": "my_data"
              }
            }
          }
        }
      }
    }
  }
}

The search returns the following result:

{
  "aggregations": {
    "BY_SENSOR": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "Sensor1",
          "doc_count": 2,
          "LATEST_TIMESTAMP": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 1,
            "buckets": [
              {
                "key": 1485331800000,
                "key_as_string": "2017-01-25T08:10:00.000Z",
                "doc_count": 1,
                "LATEST_VALUE": {
                  "value": 1
                }
              }
            ]
          }
        },
        {
          "key": "Sensor2",
          "doc_count": 2,
          "LATEST_TIMESTAMP": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 1,
            "buckets": [
              {
                "key": 1485331800000,
                "key_as_string": "2017-01-25T08:10:00.000Z",
                "doc_count": 1,
                "LATEST_VALUE": {
                  "value": 2
                }
              }
            ]
          }
        }
      ]
    }
  }
}

Now, I want to perform a "top_hits" aggregations on "BY_SENSOR>LATEST_TIMESTAMP>LATEST_VALUE".
But it seems that this kind of aggregation is not yet supported (see See https://github.com/elastic/elasticsearch/issues/21135)

I guess that the optimal solution for this kind of use case, which is rather common, is to have a dedicated
elastic search index that stores the latest value of the metrics with the same "sensor ID"

Any feedback on that matter?


(ruflin) #2

It seems the only work around is currently doing this on the client side :frowning: Good you posted it in the github issue. That should give it the most visibility.


#3

Link to github issue: https://github.com/elastic/beats/issues/3473


(ruflin) #4

Argh, sorry misunderstanding. I meant the elasticsearch github issue is a great place. Ok for you to just keep the elasticsearch one and close the one opened in beats?


#5

@ruflin; Keeping the ES bug in github is fine.

Meanwhile, I found a way to duplicate the metrics in the "latest" index by using a ES ingest pipeline to transform the "_id" with the ID of the sensor and "_index" field with "latest_metrics"

PUT /_ingest/pipeline/transform-latest-metric HTTP/1.1
{
    "processors": [ 
         { "script":{
              "lang":"painless",
               "inline":"ctx._id = ctx.sensor_id ?: ctx._id ; ctx._index = ctx.latest_index ?: ctx._index"}
    }]
}

(ruflin) #6

Cool, thanks for sharing.


(system) #7

This topic was automatically closed after 21 days. New replies are no longer allowed.