A top_hits metric aggregator keeps track of the most relevant document being aggregated.
The top_hits aggregator isn’t a metric aggregator and therefore can’t be used in the order option of the terms aggregator.
It seems to me that there is a bit of incosistency on that page: one statement says that it is a metric one and other says the opposite.
As for me, the top_hits aggregation looks more like a bucket one, because it yields a set of documents almost the same way, as e.g. the filter aggregation do. I am talking abstracting mine mind from the inner implementation under the hood, as I don't know it. Perhaps, I would think the other way if I was familiar with sources
I would be appreciative if someone clarifies this issue for me.
The best way to look at top_hits is to see the returned hits as a metric. When taking that into in account it is a metric agg. Also top_hits can't have sub aggregations.
With full respect to the Elasticsearch team's position about this aggregation, it seems to me that such confusions would disappear if it will be a bucket one, because it yields a set of documents. It looks more like a single bucket aggregation.
It is really confusing - the fact that I must treat a set of documents as a metric, while it is more natural to see it as a bucket of documents.
I understand that it was called metric because it shares some technical properties and limitations with other metric aggregations (like sub aggregations restriction). Don't want to offend anybody's idea, but calling top_hits a metric aggregation looks for me somewhat artificial. Perhaps, it was forced by some inner details of the implementation.
I think the if top_hits aggregation was a bucket aggregation it would be more confusing?
How would this agg determine what would be the bucket? A single document or all the documents that are returned? There is nothing identifying what a bucket is here. This is why I think it doesn't make sense. For example the terms and range aggregation clearly identify what the buckets would be based on properties inside documents. The top_hits aggregation is just a way to peek inside the top matching documents for a particular bucket.
Now this distinction between types of aggregations is more clear for me. As I understand now, bucket aggregations yield one or more buckets, each bucket is rendered in the result set as a doc_count field with some others (e.g. key, etc.), but internally each bucket holds a set of documents which can be sub aggregated further.
While metric aggregations yield an object for each bucket as a results of some computations over the set documents that each bucket holds. And metric aggregation ends up with that object, not holding any set of documents as the bucket aggregations do, so they can not be sub aggregated. For numeric aggregations that object is very plain - a number, or a set of numbers.
I understand now what confused me. When I am thinking about any metric I expect it to be a number, or a set of numbers. Among all metric aggregations now available in the Elasticsearch only the top_hits aggregation does not fit into this concept (even geo metric aggregations fit it). Really, the structure of the top_hits aggregation result is too complicated comparing to a single number, or a set of numbers, and there can be not only numbers (there can be anything that can be seen in a source of a document). So that is why those results do not look for me as a metric.
So, I understand that really top_hits is not a bucket aggregation, but it is hard for me to call its results a "metric". Generally a metric is a measure, so a number, or a set of numbers is expected to be seen, but not a rendered set of documents. That is really confusing for me.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.