Filter_duplicate_text doesn't run on background

(Eran H) #1

When setting "filter_duplicate_text": true in significant_text aggregation, it changes the count only for the doc_count and not for the bg_count.
Meaning the score will count documents that the word appear in as documents that the word doesn't appear in.

This is problematic as a word that appears a lot in duplicated texts has a significantly lower score than it should.

What are your thoughts regarding this?

Thanks