Filter_duplicate_text doesn't run on background

Eran_H · May 15, 2019, 12:53pm

When setting "filter_duplicate_text": true in significant_text aggregation, it changes the count only for the doc_count and not for the bg_count.
Meaning the score will count documents that the word appear in as documents that the word doesn't appear in.

This is problematic as a word that appears a lot in duplicated texts has a significantly lower score than it should.

What are your thoughts regarding this?

Thanks

system · June 12, 2019, 12:53pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Significant text on nested objects Elasticsearch	2	559	October 3, 2018
ES 6 SignificantTextAggregation's DeDuplicatingTokenFilter usage Elasticsearch	2	630	November 22, 2017
Detail questions about significant terms aggregation Elasticsearch	2	628	April 12, 2014
Detail questions about significant_terms aggregation Elasticsearch	0	348	April 11, 2014
Significant terms aggregation custom score with more than one background Elasticsearch	13	1064	November 22, 2019

Filter_duplicate_text doesn't run on background

Related topics