Run a query after grouping and finding max

Here is some test data I have in an index:

{"name" : "almara" , "version" : "1" , "groups" : "blueHouse", "data1" : " value1"}
{"name" : "almara" , "version" : "2" , "groups" : "blueHouse", "data1" : " Something"}
{"name" : "almara" , "version" : "3" , "groups" : "blueHouse", "data1" : " Something"}
{"name" : "almara" , "version" : "4" , "groups" : "blueHouse", "data1" : " value1"}

{"name" : "donna" , "version" : "1" , "groups" : "redHouse", "data1" : " value1"}
{"name" : "donna" , "version" : "2" , "groups" : "redHouse", "data1" : " Something"}
{"name" : "donna" , "version" : "3" , "groups" : "redHouse", "data1" : " Something"}
{"name" : "donna" , "version" : "4" , "groups" : "redHouse", "data1" : " value1"}

{"name" : "maggie" , "version" : "1" , "groups" : "greenHouse", "data1" : " value1"}
{"name" : "maggie" , "version" : "2" , "groups" : "greenHouse", "data1" : " Something"}
{"name" : "maggie" , "version" : "3" , "groups" : "greenHouse", "data1" : " Something"}
{"name" : "maggie" , "version" : "4" , "groups" : "greenHouse", "data1" : " Something"}

I would like to have query that searches for "value1" in the "data1" field and have it return two matches:

{"name" : "donna" , "version" : "4" , "groups" : "redHouse", "data1" : " value1"}
{"name" : "almara" , "version" : "4" , "groups" : "blueHouse", "data1" : " value1"}

This is by first grouping all documents based on the "groups" field and then only considering the documents that have the highest version number in each of the grouped documents.

How would a query for that look like?

I have tried using collapse, but it collapses the results the search by searching for "value1" in all the documents and then collapsing.

{
    "query": {
        "match": {
            "data1" :"value1"
        }
    },
    "collapse": {
        "field": "groups"
    },
    "sort": [
        {
            "version": {
                "order": "desc"
            }
        }
    ]
}

Not able to figure this out using aggregations as well. Please assist.

try it.

{
    "query": {
        "match": {
            "data1" :"value1"
        }
    },
    "collapse": {
        "field": "groups",
		"inner_hits": {
			"name": "versionDesc",
			"size": 2,
			"sort": [
				{
					"version": {
						"order": "desc"
					}
				}
			]
		}
    },
    "sort": [
        {
            "version": {
                "order": "desc"
            }
        }
    ],
	"size": 0
}

you can use inner_hits

Thanks @canelia

This does get the results. But If I change the search term to "Something", like in the query below, then I also get a result from an object that is version 3.

GET /test/_search
{
    "query": {
        "match": {
            "data1" :"Something"
        }
    },
    "collapse": {
        "field": "groups.keyword",
		"inner_hits": {
			"name": "versionDesc",
			"size": 2,
			"sort": [
				{
					"version": {
						"order": "desc"
					}
				}
			]
		}
    },
    "sort": [
        {
            "version": {
                "order": "desc"
            }
        }
    ],
	"size": 10
}

This returns
{"name" : "almara" , "version" : 3 , "groups" : "blueHouse", "data1" : " Something"}
However, it would be ideal if this is not part of the result as version 4 is the highest version number in the group "blueHouse".

Aggregations runs on the query result, so my attempt below also fails:

GET /test/_search
{
    "aggs": {
      "group_by_key": {
            "terms": {
                "field": "groups.keyword"
            },
            "aggs": {                
                "latest": {
                    "max": {
                        "field": "version"
                        
                    }
                }
            }
        }
    },
    "query": {
        "match": {
            "data1" :"Something"
        }
    },
    "collapse": {
        "field": "groups.keyword",
		"inner_hits": {
			"name": "versionDesc",
			"size": 2,
			"sort": [
				{
					"version": {
						"order": "desc"
					}
				}
			]
		}
    },
    "sort": [
        {
            "version": {
                "order": "desc"
            }
        }
    ],
	"size": 10
}

I thought I can get the max version and then have the application side check if the max version is part of the result and filter. But that doesn't seem to be the case here.

I can have two requests, but would be nice if it can all be in one.

Do you have other suggestions I can try?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.