Elasticsearch group by and max on a text field of type Keyword

I am using the Elasticsearch 6.5. I am looking to find the documents with some filter condition (working fine) and then I need to take only docs which has the MAX timestamp (field in the doc).

Basically when I search the index using below payload, I get the data as shown below (only important fields shown, data has other fields and stuff)

```
    {
      "query": {
            "bool": {
                "must": [
                    {
                        "match": { "myfield.date" : "2019-07-02" }
                    },
                    {
                        "match": { "myfield.data" : "ABC" }
                    },
                    {
                        "bool": {
                            "should": [
                                {
                                    "regexp": { "myOtherFieldId": "myregex1" }
                                },
                                {
                                    "regexp": { "myOtherFieldId": "myregex2" }
                                }
                            ]
                        }
                    }
                ]
            }
        }
    }
    ```

Basically, I am getting the following docs. So I need only those docs which has max timestamp out of these filter. So docs having id: PQR_20190704100000_0199

date, book, id, timestamp
2019-07-02, ABC, PQR_20190703130000_1234, 2019-07-03 13:01:00
2019-07-02, ABC, PQR_20190703140000_234, 2019-07-03 14:01:00
2019-07-02, ABC, PQR_20190704100000_0199, 2019-07-04 10:01:00

I tried something like this. Please note that the "id" here is type keyword.
"Id":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}

{
  "query": {
        "bool": {
            "must": [
                {
                    "match": { "myfield.date" : "2019-07-02" }
                },
                {
                    "match": { "myfield.data" : "ABC" }
                },
                {
                    "bool": {
                        "should": [
                            {
                                "regexp": { "myOtherFieldId": "myregex1" }
                            },
                            {
                                "regexp": { "myOtherFieldId": "myregex2" }
                            }
                        ]
                    }
                }
            ]
        }
    },
"aggs": {
    "group_by_id" : {
        "terms": { 
            "field": "field1.Id"
        },
        "aggs": {
            "timeStamp": {
                "max": { 
                    "field": "field1.Id"
                }
            }
        }
    },
    "max_timestamp": {
        "max_bucket": {
            "buckets_path": "group_by_id>timeStamp"
        }
    }
},
"size": "10000"
}

I am still getting all the documents in output i.e. not only with relevant id, older ids too.
The data retrieved shows NULL for the MAX values.

"aggregations": {
        "group_by_id": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "PQR_20190703130000_1234",
                    "doc_count": 947,
                    "timeStamp": {
                        "value": null
                    }
                },
                {
                    "key": "PQR_20190703140000_234",
                    "doc_count": 947,
                    "timeStamp": {
                        "value": null
                    }
                },
                {
                    "key": "PQR_20190704100000_0199",
                    "doc_count": 947,
                    "timeStamp": {
                        "value": null
                    }
                }
            ]
        },
        "max_timestamp": {
            "value": null,
            "keys": []
        }
    }

What am I missing here?

I guess you want the last-sighting for an ID but I'm unclear if this would be for multiple IDs or just one.
One way to get the last doc for an ID is to use the top_hits aggregation under a terms aggregation for the ID field. Set the size of the top_hits aggregation to 1 and use the sort option to sort by reverse date order. This should ensure you get the latest doc for each unique ID.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.