Hi All,
We're currently implementing our search with help of the Elastic Stack, but we're having some trouble providing accurate paging for our users.
Since we are using collapsed hits/inner hits as part of our query, the total hits provided by the response does not match the actual hits that will be returned. Without that information, it seems impossible to actually implement that.
We have been trying various workarounds, however they all seem to be approximations, which would be a bitter pill to swallow for us and our customers.
As a bit of background, we are collapsing on one field (type_information.collapse_id
), which always contains a GUID.
So far, we have tried:
- Using a cardinality aggregation:
"collapsed_total": {
"cardinality": {
"field": "type_information.collapse_id"
}
}
However, since this is only an approximation, this was off quite a bit on larger resultsets (even with maxed out precision)
- Using a terms aggregation, and then subtracting that from the total amount:
"collapsed_total": {
"terms": {
"field": "type_information.collapse_id",
"min_doc_count": 2,
"size": 10000
}
}
However, this required us to increase limits on search.max_buckets
to unnecessarily high levels, and if I understand it correctly, will also start being inaccurate as soon as we move our development environment to the intended setup with multiple shards.
I understand many of these issues only happen with a large number of results and should not occur if a user uses our search well, but we have a wide external user base with sometimes not very good "search skills".
Are there any ideas on how we could solve this issue in a way that leads us to accurate results even with searches that have a very large resultset?
Cheers,
Stefano