Return document for aggregation result

Stupid question and might be easy for experts:

I want to perform max aggregation and want to get the corresponding document for the the result:

POST /sales/_search?&size=1
{
    "aggs" : {        
        "max_price" : {             
            "max" : {
                "field" : "price",             
            }        
        }
    }
}

Right now, the document returned is different that the one that has max price. If I want to return the document with max price, how can that be achieved?

1 Like

You can get that if you [sort your search results.] (https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-sort.html) by price.

You have set the size to 1 already to retrieve one doc alongside your agg results so sorting should ensure it is the doc with the highest price.

I thought about sorting, but isn't it an expensive operation since it uses more memory compared to max aggregation?

Isn't there a way to return the document with the above query?

The max agg accesses the same doc-value lookup mechanism as the sorting logic does and for the same number of matching docs.

Understood. Somehow I thought that elasticsearch keeps additional metadata for each shard like memsql does, especially for min/max values. Is there any documentation I can reference to understand the details of how max aggregation and/or sorting work internally? Would love to deep dive here.

Will use sorting then. Thanks for your quick reply and help, really appreciate it. Following is how query looks like with sort in case someone references the same question in future.

GET /bank/_search?&size=1
{
    "sort" : { "price" : {"order" : "desc"} }
}

Max agg is designed for use in a context which would prevent it from making use of any such pre-computed global max values - see Very slow aggregation performance for trivial aggs - #2 by Mark_Harwood

That makes sense.

I would argue a bit here though. Global min/max are only expected to be used for normal cases without additional filters and is a very common scenario. I am not saying that it is easy to implement(I come from storage background as well).

Row-level security is also a common requirement and any pre-aggregated values that go unfiltered in a store can represent a security risk. Each one of these values would need an "if security enabled do X, else do Y" safeguard adding around it.

Fair enough. I have no idea why would pre-aggregated values require security, but then I don't deal with security, so pardon my knowledge in that domain.

GET /bank/accounts/_search
{
	"aggs" : { 
		"any-oligarchs-hiding-here?" : {
			"max" : { "field":"balance"} 
		}
	}
}

:slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.