Is top_hits aggregation really a metric one?

golubev · April 18, 2016, 6:33pm

Hello!

I got confused with the top_hits aggregation. It is not clear for me that it is a metric one.

While reading top_hits reference page one can find this two conflicting statements:

A top_hits metric aggregator keeps track of the most relevant document being aggregated.

The top_hits aggregator isn’t a metric aggregator and therefore can’t be used in the order option of the terms aggregator.

It seems to me that there is a bit of incosistency on that page: one statement says that it is a metric one and other says the opposite.

As for me, the top_hits aggregation looks more like a bucket one, because it yields a set of documents almost the same way, as e.g. the filter aggregation do. I am talking abstracting mine mind from the inner implementation under the hood, as I don't know it. Perhaps, I would think the other way if I was familiar with sources

I would be appreciative if someone clarifies this issue for me.

mvg · April 19, 2016, 9:52am

That documentation is confusing I've updated it:

The best way to look at top_hits is to see the returned hits as a metric. When taking that into in account it is a metric agg. Also top_hits can't have sub aggregations.

golubev · April 19, 2016, 10:50am

Thanks, @mvg!

With full respect to the Elasticsearch team's position about this aggregation, it seems to me that such confusions would disappear if it will be a bucket one, because it yields a set of documents. It looks more like a single bucket aggregation.

It is really confusing - the fact that I must treat a set of documents as a metric, while it is more natural to see it as a bucket of documents.

I understand that it was called metric because it shares some technical properties and limitations with other metric aggregations (like sub aggregations restriction). Don't want to offend anybody's idea, but calling top_hits a metric aggregation looks for me somewhat artificial. Perhaps, it was forced by some inner details of the implementation.

Thanks for hearing me!

mvg · April 20, 2016, 1:12pm

I think the if top_hits aggregation was a bucket aggregation it would be more confusing?

How would this agg determine what would be the bucket? A single document or all the documents that are returned? There is nothing identifying what a bucket is here. This is why I think it doesn't make sense. For example the terms and range aggregation clearly identify what the buckets would be based on properties inside documents. The top_hits aggregation is just a way to peek inside the top matching documents for a particular bucket.

golubev · April 21, 2016, 6:19pm

Thanks, Martijn @mvg!

Now this distinction between types of aggregations is more clear for me. As I understand now, bucket aggregations yield one or more buckets, each bucket is rendered in the result set as a doc_count field with some others (e.g. key, etc.), but internally each bucket holds a set of documents which can be sub aggregated further.

While metric aggregations yield an object for each bucket as a results of some computations over the set documents that each bucket holds. And metric aggregation ends up with that object, not holding any set of documents as the bucket aggregations do, so they can not be sub aggregated. For numeric aggregations that object is very plain - a number, or a set of numbers.

I understand now what confused me. When I am thinking about any metric I expect it to be a number, or a set of numbers. Among all metric aggregations now available in the Elasticsearch only the top_hits aggregation does not fit into this concept (even geo metric aggregations fit it). Really, the structure of the top_hits aggregation result is too complicated comparing to a single number, or a set of numbers, and there can be not only numbers (there can be anything that can be seen in a source of a document). So that is why those results do not look for me as a metric.

So, I understand that really top_hits is not a bucket aggregation, but it is hard for me to call its results a "metric". Generally a metric is a measure, so a number, or a set of numbers is expected to be seen, but not a rendered set of documents. That is really confusing for me.

Thanks again, Martijn!

Topic		Replies	Views
Top-hits on aggregations Elasticsearch	2	351	July 9, 2021
Elasticsearch top hits aggregation not working as expected Elasticsearch	4	715	July 5, 2017
Can the max-metric aggs return any fields i want of the doc which is the max in bucket Elasticsearch	1	613	July 5, 2017
Bucket query results \| top hits performance Elasticsearch	8	3772	July 6, 2017
Top_hits aggregation performance problem Elasticsearch	2	532	March 16, 2021

Is top_hits aggregation really a metric one?

Related topics