question: Is there any way to build a query whose hit count has an upper limit N in order to be able to build an aggregation limited to those top N results ? And if so how ?
Just to clarify, aggregation needs to be done on the top hits of the scope query, and not access the top hits of each bucket which (if i am right) is what the top_hits aggregation provides. i.e. is it possible to have a sub-aggregation of the top_hits aggregation? if so how?
Maybe you want the experimental sampler aggregation? I have an example
using US high school data that uses a sampler aggregation to answer
questions about students most similar to one under analysis. It uses a
sampler aggregation to learn the predominant characteristics of the N most
similar students.
(If you're interested I'll be demoing this the at ElasticOn)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.