Is dfs_query_then_fetch relevant for BM25/ES 5.0?


I'm interested in getting accurate scores for queries that span multiple indices with distinct document types. I understand today I can use dfs_query_then_fetch to ensure the document frequencies are relevant to the whole corpus of documents, not just each shard.

How is this affected with the switch to BM25? Would dfs_query_then_fetch solve the same problem on queries across indices scored by BM25? Are there other terms of the calculation that need pre-querying in BM25 like document frequency needs in TF-IDF?

(Ivan Brusic) #2

Even with dfs_query_then_fetch, the values are still only calculate per
index, so it will not solve your problem for a multi-index search.

In the single index case, I think dfs_query_then_fetch is still beneficial.
BM25 will saturate the TF values sooner, but the value would still be
calculated per shard without it. Usually it just takes large indices to
have better sharded TF values.

The Lucene BM25 parameters deal with term frequencies, not document



Thanks for the explanation, that's quite helpful.

(system) #4