When collapsing results, the total number of hits (using track_total_hits
) doesn't take collapsing into account, i.e. the total number of documents is returned, not the number of collapsed groups.
E.g. if I index 1508540 cars, from 116 brands, this query will return 1508540 in hits.total.value
:
GET cars/_search?track_total_hits=true
{
"collapse": {
"field": "brand"
}
}
If I want to count groups, I can use the cardinality
aggregation.
This query will return the expected 116 in aggregations.brands_count.value
but this is really slow for large indexes, event with a low precision_threshold
.
GET cars/_search?size=0&track_total_hits=false
{
"aggs": {
"brands_count": {
"cardinality": {
"field": "brand",
"precision_threshold": 150
}
}
}
}
I could create a brands
index, but in this example, I would need to sort collapsed results based on individual car features, so it would not help me.
I found some Github issues talking about this (here and here), but they are from 2017.
Do we have a way to count groups efficiently now? Is there some known workarounds?