SignificantTerms Agg : _superset_size greater than doc_count !?

tomlameche · September 17, 2015, 5:28pm

yes, the definition of subset and superset is clear.
My problem is that the value of _superset_size in a script is greater than the total number of document in my index. I think there is a bug somewhere.

I do a nested significant terms aggregation after a simple term aggregation, similar as the exemple describe here : https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html

I need to perform a custom score and so i try to used "script_heuristic" : the result seems strange for me, so i modify my script to view the value of each variable (_superset_size, _superset_freq, _subset_freq and _subsetset_size) with :
"script_heuristic": {
"script": "_superset_size"
}

And what a surprise : the value of _superset_size is greater than the number of total document in my index...

In addition, the value of bg_count is greater than the value of total count for each terms, as descibe here Bg_counts in nested significant_terms aggregation

There is a bug, i guess

Topic		Replies	Views
Support buckets-path with significant terms aggregations Elasticsearch	9	1230	July 26, 2017
Bg_counts in nested significant_terms aggregation Elasticsearch	3	1276	July 5, 2017
Get super set frequency on significant terms aggregations Elasticsearch	1	589	July 6, 2017
Detail questions about significant terms aggregation Elasticsearch	3	583	July 6, 2017
Significant terms aggregations results dependent on size request parameter? Elasticsearch	2	442	July 5, 2017

SignificantTerms Agg : _superset_size greater than doc_count !?

Related topics