Elasticsearch: aggregation and select docs only having max value of field

Mihir_Kothari · July 31, 2019, 6:07pm

I am using elastic search 6.5.

Basically, based on my query my index can return multiple documents, I need only those documents which has the max value for a particular field. E.g.

{
  "query": {
        "bool": {
            "must": [
                {
                    "match": { "header.date" : "2019-07-02" }
                },
                {
                    "match": { "header.field" : "ABC" }
                },
                {
                    "bool": {
                        "should": [
                            {
                                "regexp": { "body.meta.field": "myregex1" }
                            },
                            {
                                "regexp": { "body.meta.field": "myregex2" }
                            }
                        ]
                    }
                }
            ]
        }
    },
"size" : 10000
}

The above query will return lots of documents/messages as per the query. The sample data returned is:

 "header" : {
                "id" : "Text_20190702101200123_111",
                "date" : "2019-07-02"
                "field": "ABC"
    },
    "body" : {
                "meta" : {
                    "field" : "myregex1",
                    "timestamp": "2019-07-02T10:12:00.123Z",
               }
    }
    -----------------
    "header" : {
                "id" : "Text_20190702151200123_121",
                "date" : "2019-07-02"
                "field": "ABC"
    },
    "body" : {
                "meta" : {
                    "field" : "myregex2",
                    "timestamp": "2019-07-02T15:12:00.123Z",
                }
    }
    -----------------
    "header" : {
                "id" : "Text_20190702081200133_124",
                "date" : "2019-07-02"
                "field": "ABC"
    },
    "body" : {
                "meta" : {
                    "field" : "myregex1",
                    "timestamp": "2019-07-02T08:12:00.133Z",
                }
    }

So based on the above 3 documents, I only want the max timestamp one to be shown i.e. "timestamp": "2019-07-02T15:12:00.123Z" I only want one document in above example.

Please note that I can have more than one messages for same timestamp. So I want them all i.e. all the messages/documents belonging to the max time stamp.

{
  "query": {
        "bool": {
            "must": [
                {
                    "match": { "header.date" : "2019-07-02" }
                },
                {
                    "match": { "header.field" : "ABC" }
                },
                {
                    "bool": {
                        "should": [
                            {
                                "regexp": { "body.meta.field": "myregex1" }
                            },
                            {
                                "regexp": { "body.meta.field": "myregex2" }
                            }
                        ]
                    }
                }
            ]
        }
    },
   "aggs": {
"group": {
    "terms": {
        "field": "header.id",
        "order": { "group_docs" : "desc" }
    },
    "aggs" : {
        "group_docs": { "max" : { "field": "body.meta.tiemstamp" } }
    }
}
},

"size": "10000"
}

Executing the above, I am still getting all the 3 documents, instead of only one. (or ones which match the max timestamp).
I do get the buckets though, but I need only one of them and not all the buckets. The output in addition to all the records,

"aggregations": {
        "group": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "Text_20190702151200123_121",
                    "doc_count": 29,
                    "group_docs": {
                        "value": 1564551683867,
                        "value_as_string": "2019-07-02T15:12:00.123Z"
                    }
                },
                {
                    "key": "Text_20190702101200123_111",
                    "doc_count": 29,
                    "group_docs": {
                        "value": 1564551633912,
                        "value_as_string": "2019-07-02T10:12:00.123Z"
                    }
                },
                {
                    "key": "Text_20190702081200133_124",
                    "doc_count": 29,
                    "group_docs": {
                        "value": 1564510566971,
                        "value_as_string": "2019-07-02T08:12:00.133Z"
                    }
                }
            ]
        }
    }

What am I missing here?

I am looking for something like HAVING (after group by) in SQL?

RahulD · July 31, 2019, 7:41pm

If you always want the top N results that satisfy your filter sorted by a field you can run something like:

{
  "size": N,
  "query": {
    "bool": {
      "must": [
        {},
        {}
      ]
    }
  },
  "sort": [
    {
      "sortfieldname": {
        "order": "desc"
      }
    }
  ]
}

Mihir_Kothari · July 31, 2019, 8:45pm

This won't work. As I mentioned, I will have more than 1 message with same timestamp. In above example there are 29 messages for same timestamp (It can go to any number). So there are 29 * 3 messages being retrieved by my query.
If I give size = N, N is not defined. Still the total number of messages retired (after just sorting) will be 29 * 3 in above example.

I am looking for something, where I have bucketed these messages. now I only want to retireve one bucket i.e. max time stamp one and no documents or messages from any other buckets.

system · August 28, 2019, 8:45pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch group by and max on a text field of type Keyword Elasticsearch	2	1099	August 16, 2019
How to get max value with document data Elasticsearch	4	7685	July 5, 2017
In elasticsearch finding documents which meet a specific criteria for the latest for each group Elasticsearch	3	175	April 28, 2023
Return all documents that have the highest value in a field Elasticsearch	1	1290	March 22, 2018
ElasticSearch Max of Max? Elasticsearch	3	701	March 2, 2017

Elasticsearch: aggregation and select docs only having max value of field

Related topics