Elasticsearch: aggregation and select docs only having max value of field

I am using elastic search 6.5.

Basically, based on my query my index can return multiple documents, I need only those documents which has the max value for a particular field. E.g.

{
  "query": {
        "bool": {
            "must": [
                {
                    "match": { "header.date" : "2019-07-02" }
                },
                {
                    "match": { "header.field" : "ABC" }
                },
                {
                    "bool": {
                        "should": [
                            {
                                "regexp": { "body.meta.field": "myregex1" }
                            },
                            {
                                "regexp": { "body.meta.field": "myregex2" }
                            }
                        ]
                    }
                }
            ]
        }
    },
"size" : 10000
}

The above query will return lots of documents/messages as per the query. The sample data returned is:

 "header" : {
                "id" : "Text_20190702101200123_111",
                "date" : "2019-07-02"
                "field": "ABC"
    },
    "body" : {
                "meta" : {
                    "field" : "myregex1",
                    "timestamp": "2019-07-02T10:12:00.123Z",
               }
    }
    -----------------
    "header" : {
                "id" : "Text_20190702151200123_121",
                "date" : "2019-07-02"
                "field": "ABC"
    },
    "body" : {
                "meta" : {
                    "field" : "myregex2",
                    "timestamp": "2019-07-02T15:12:00.123Z",
                }
    }
    -----------------
    "header" : {
                "id" : "Text_20190702081200133_124",
                "date" : "2019-07-02"
                "field": "ABC"
    },
    "body" : {
                "meta" : {
                    "field" : "myregex1",
                    "timestamp": "2019-07-02T08:12:00.133Z",
                }
    }

So based on the above 3 documents, I only want the max timestamp one to be shown i.e. "timestamp": "2019-07-02T15:12:00.123Z" I only want one document in above example.

Please note that I can have more than one messages for same timestamp. So I want them all i.e. all the messages/documents belonging to the max time stamp.

{
  "query": {
        "bool": {
            "must": [
                {
                    "match": { "header.date" : "2019-07-02" }
                },
                {
                    "match": { "header.field" : "ABC" }
                },
                {
                    "bool": {
                        "should": [
                            {
                                "regexp": { "body.meta.field": "myregex1" }
                            },
                            {
                                "regexp": { "body.meta.field": "myregex2" }
                            }
                        ]
                    }
                }
            ]
        }
    },
   "aggs": {
"group": {
    "terms": {
        "field": "header.id",
        "order": { "group_docs" : "desc" }
    },
    "aggs" : {
        "group_docs": { "max" : { "field": "body.meta.tiemstamp" } }
    }
}
},

"size": "10000"
}

Executing the above, I am still getting all the 3 documents, instead of only one. (or ones which match the max timestamp).
I do get the buckets though, but I need only one of them and not all the buckets. The output in addition to all the records,

"aggregations": {
        "group": {
            "doc_count_error_upper_bound": 0,
            "sum_other_doc_count": 0,
            "buckets": [
                {
                    "key": "Text_20190702151200123_121",
                    "doc_count": 29,
                    "group_docs": {
                        "value": 1564551683867,
                        "value_as_string": "2019-07-02T15:12:00.123Z"
                    }
                },
                {
                    "key": "Text_20190702101200123_111",
                    "doc_count": 29,
                    "group_docs": {
                        "value": 1564551633912,
                        "value_as_string": "2019-07-02T10:12:00.123Z"
                    }
                },
                {
                    "key": "Text_20190702081200133_124",
                    "doc_count": 29,
                    "group_docs": {
                        "value": 1564510566971,
                        "value_as_string": "2019-07-02T08:12:00.133Z"
                    }
                }
            ]
        }
    }

What am I missing here?

I am looking for something like HAVING (after group by) in SQL?

If you always want the top N results that satisfy your filter sorted by a field you can run something like:

{
  "size": N,
  "query": {
    "bool": {
      "must": [
        {},
        {}
      ]
    }
  },
  "sort": [
    {
      "sortfieldname": {
        "order": "desc"
      }
    }
  ]
}

This won't work. As I mentioned, I will have more than 1 message with same timestamp. In above example there are 29 messages for same timestamp (It can go to any number). So there are 29 * 3 messages being retrieved by my query.
If I give size = N, N is not defined. Still the total number of messages retired (after just sorting) will be 29 * 3 in above example.

I am looking for something, where I have bucketed these messages. now I only want to retireve one bucket i.e. max time stamp one and no documents or messages from any other buckets.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.