Elasticsearch group by and max on a text field of type Keyword

Mihir_Kothari · July 16, 2019, 7:43pm

I am using the Elasticsearch 6.5. I am looking to find the documents with some filter condition (working fine) and then I need to take only docs which has the MAX timestamp (field in the doc).

Basically when I search the index using below payload, I get the data as shown below (only important fields shown, data has other fields and stuff)

```
    {
      "query": {
            "bool": {
                "must": [
                    {
                        "match": { "myfield.date" : "2019-07-02" }
                    },
                    {
                        "match": { "myfield.data" : "ABC" }
                    },
                    {
                        "bool": {
                            "should": [
                                {
                                    "regexp": { "myOtherFieldId": "myregex1" }
                                },
                                {
                                    "regexp": { "myOtherFieldId": "myregex2" }
                                }
                            ]
                        }
                    }
                ]
            }
        }
    }
    ```

Basically, I am getting the following docs. So I need only those docs which has max timestamp out of these filter. So docs having id: PQR_20190704100000_0199

date, book, id, timestamp
2019-07-02, ABC, PQR_20190703130000_1234, 2019-07-03 13:01:00
2019-07-02, ABC, PQR_20190703140000_234, 2019-07-03 14:01:00
2019-07-02, ABC, PQR_20190704100000_0199, 2019-07-04 10:01:00

I tried something like this. Please note that the "id" here is type keyword.
"Id":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}

{
  "query": {
        "bool": {
            "must": [
                {
                    "match": { "myfield.date" : "2019-07-02" }
                },
                {
                    "match": { "myfield.data" : "ABC" }
                },
                {
                    "bool": {
                        "should": [
                            {
                                "regexp": { "myOtherFieldId": "myregex1" }
                            },
                            {
                                "regexp": { "myOtherFieldId": "myregex2" }
                            }
                        ]
                    }
                }
            ]
        }
    },
"aggs": {
    "group_by_id" : {
        "terms": { 
            "field": "field1.Id"
        },
        "aggs": {
            "timeStamp": {
                "max": { 
                    "field": "field1.Id"
                }
            }
        }
    },
    "max_timestamp": {
        "max_bucket": {
            "buckets_path": "group_by_id>timeStamp"
        }
    }
},
"size": "10000"
}

I am still getting all the documents in output i.e. not only with relevant id, older ids too.
The data retrieved shows NULL for the MAX values.

"aggregations": {
        "group_by_id": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "PQR_20190703130000_1234",
                    "doc_count": 947,
                    "timeStamp": {
                        "value": null
                    }
                },
                {
                    "key": "PQR_20190703140000_234",
                    "doc_count": 947,
                    "timeStamp": {
                        "value": null
                    }
                },
                {
                    "key": "PQR_20190704100000_0199",
                    "doc_count": 947,
                    "timeStamp": {
                        "value": null
                    }
                }
            ]
        },
        "max_timestamp": {
            "value": null,
            "keys": []
        }
    }

What am I missing here?

Mark_Harwood · July 19, 2019, 12:29pm

I guess you want the last-sighting for an ID but I'm unclear if this would be for multiple IDs or just one.
One way to get the last doc for an ID is to use the top_hits aggregation under a terms aggregation for the ID field. Set the size of the top_hits aggregation to 1 and use the sort option to sort by reverse date order. This should ensure you get the latest doc for each unique ID.

system · August 16, 2019, 12:29pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch: aggregation and select docs only having max value of field Elasticsearch	3	2852	August 28, 2019
How to parse below SQL query to ES? Elasticsearch	10	2831	November 6, 2017
In elasticsearch finding documents which meet a specific criteria for the latest for each group Elasticsearch	3	173	April 28, 2023
How to compare document fields in a elasticsearch query Elasticsearch	2	584	September 26, 2023
Keyword search using Elastic search Elasticsearch	3	642	November 16, 2019

Elasticsearch group by and max on a text field of type Keyword

Related topics