I am using the field collapsing feature to group my results by a certain field and get only the top ranked hits.
In the documentation, it is mentioned that ''The total number of hits in the response indicates the number of matching documents without collapsing. The total number of distinct group is unknown.''
The problem is that i can not do correct pagination since the total hits count which is returned is the count of the collapsed top hits + the inner_hits for each collapsed top hits.
Is there a recommended way to set the total hits as just the top hits (distinct group) count ?
Thanks for posting, I'm running into the same issue of the total_hits not being correct. We got around the paging issue by requesting +1 per page and using that to determine if there is a next page. It obviously doesn't work for jumping more than one page forward, but our users are OK with that. The bigger issue for us is the confusion caused when the count of results is drastically different than the number of actual results displayed.
Field collapsing is an awesome feature, but the top_hits issue really limits it's usefulness for us.
It is possible to use aggregations to get the correct number of collapsed documents:
"aggs": {
"total": {
"cardinality": {
"field": "name-of-collapsed-field"
}
}
}
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.