_source always zero in cardinality aggregation

autogenerated_id · October 2, 2020, 12:22pm

Thanks for the help, it's much appreciated! I didn't know about Debug.explain().

Let's dig into my use case step by step. Suppose an index has, among others, fields left and right. For starters, let's say we want to find out the number of distinct values in both left and right across all documents. That is, each value should be counted only once, even if it is found both in left and right. If we run two cardinality aggs and add the results together, some values may be counted twice. The only way I've found to do it (without returning all values and checking for duplicates by hand on the app side) is to run a scripted cardinality aggregation, where the script would be [doc.left, doc.right] (like here). This works fine.

However, the problem I'm dealing with is a bit more complex. I don't want to include all documents in the aggregation above, but left values based on one, and right values based on another criterion. So, take the index, apply filter1 to it and pick only the left values of the resulting documents. Then apply filter2 and pick only the right values of the documents that match. Finally, put all of these values in the same basket and find the number of distinct values.

Initially, I wanted to use named queries for this and just add an if to the script that would decide to return doc.left, doc.right or both, based on the matching queries. However, since this is unsupported, I tried abusing document score to do the same thing (find out which filters matched based on the score). I failed because of the described behavior. Too bad, given that performance-wise it works fine.

I could also have two filter aggregations and compute cardinality on each of them, but then I wouldn't know how to combine the results with accounting for duplicates.

The only accurate way I can think of is returning to the application all distinct values of left with filter1 applied, then doing the same for right and filter2, and finally computing the cardinality on the app side. This, however, would be too slow.

If you think there's a better way to solve this with reasonable performance, even by running multiple queries, I'd be interested to know.

Topic		Replies	Views
Getting _score 0 with Function Score Query, with explain Elasticsearch	8	3896	July 5, 2017
Function_score for score_mode: sum returning 1 instead of 0 Elasticsearch	1	480	November 23, 2021
Function Score Question Elasticsearch	1	327	July 5, 2017
Random_score always returns 0 Elasticsearch	1	416	July 6, 2017
Getting results with score 0 Elasticsearch	3	6445	July 5, 2017

_source always zero in cardinality aggregation

Related topics