Looking for max like bucket aggregation


#1

Hi everyone,
I'm quite new to Elastic and not too familiar yet with query DSL in Elastic yet. I have an Elastic database with documents containing vehicle's position at various times. Due to bad reception some documents don't contain a location field.

Here's the mapping:

PUT /agg_test
{
    "settings" : {
        "number_of_shards" : 1,
        "number_of_replicas" : 1
    },
    "mappings": {
        "_doc": {
            "properties": {
                "location": {
                    "type": "geo_point"
                },
                "time": {
                  "type": "date"
                },
                "vehicle": {
                  "type": "keyword"
                }
            }
        }
    }
}

and the data

POST /agg_test/_doc/_bulk?refresh
{"index":{"_id":1}}
{"location": "52.374081,4.912350", "time": "2018-01-01T13:00:00Z", "vehicle": "vcl_1"}
{"index":{"_id":2}}
{"name": "Museum Het Rembrandthuis","time": "2018-01-01T14:00:00Z", "vehicle": "vcl_1"}
{"index":{"_id":3}}
{"location": "52.371667,4.914722", "time": "2018-01-01T13:00:00Z", "vehicle": "vcl_2"}
{"index":{"_id":4}}
{"location": "51.222900,4.405200", "time": "2018-01-01T14:00:00Z", "vehicle": "vcl_2"}
{"index":{"_id":5}}
{"location": "48.861111,2.336389", "time": "2018-01-01T13:00:00Z", "vehicle": "vcl_3"}
{"index":{"_id":6}}
{"time": "2018-01-01T14:00:00Z", "vehicle": "vcl_3"}

I'm interested in finding the last known position of each vehicle (together with the time it was recorded) to use for visualisation. So I thought I could filter for the ones with location, then group by vehicle, get the maximum timestamp (or better the record with the maximal timestamp) and then ...

What I currently have is

GET /agg_test/_search
{
  "query": {
    "exists": { "field": "location"}
  },
  "aggs": {
    "last_position": {
      "terms": {
        "field": "vehicle"
      },
      "aggs": {
        "max_time": {
          "max": {
            "field": "time"
          }
        }
      }
    }
  },
  "size": 0
}

As you see there is no way forward, as there is nothing like a bucket aggregation constructing a bucket where some field is maximized. Alternatively I'd need a way to use the output from the query (vehicle together with timestamp) to use as input for another query to get the locations but of course I don't want to do this manually but would like one query to do all of that. Another thing I was looking for was a pipeline aggregation I could use for my task.

As far as I know, none of the things I described above can be written as query in Elastic.

What is the best practice for queries like the one I want to make in Elastic? Thank you!


(Zachary Tong) #2

I'm not quite sure I understand, are you wanting the name field of the most recent position for each vehicle?

The easiest way to get this information is a top_hits aggregation: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-metrics-top-hits-aggregation.html

E.g. terms agg on vehicle, then top_hits ordered by date descending, size of 1. That should give you the most recent document for each vehicle


(system) closed #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.