Is dfs_query_then_fetch relevant for BM25/ES 5.0?

drs · October 17, 2016, 5:48pm

I'm interested in getting accurate scores for queries that span multiple indices with distinct document types. I understand today I can use dfs_query_then_fetch to ensure the document frequencies are relevant to the whole corpus of documents, not just each shard.

How is this affected with the switch to BM25? Would dfs_query_then_fetch solve the same problem on queries across indices scored by BM25? Are there other terms of the calculation that need pre-querying in BM25 like document frequency needs in TF-IDF?

Ivan · October 17, 2016, 7:09pm

Even with dfs_query_then_fetch, the values are still only calculate per
index, so it will not solve your problem for a multi-index search.

In the single index case, I think dfs_query_then_fetch is still beneficial.
BM25 will saturate the TF values sooner, but the value would still be
calculated per shard without it. Usually it just takes large indices to
have better sharded TF values.

The Lucene BM25 parameters deal with term frequencies, not document
frequencies.

Ivan

drs · October 20, 2016, 5:00pm

Thanks for the explanation, that's quite helpful.

Topic		Replies	Views
Dfs_query_then_fetch search type across multiple indexes Elasticsearch	2	821	August 4, 2017
Dfs_query_then_fetch returns the same scores as query_then_fetch Elasticsearch	3	492	June 22, 2021
Ranking across indices using dfs_query_then_fetch Elasticsearch	1	477	August 7, 2019
Is there any way to avoid using dfs_query_then_fetch ? Elasticsearch	1	337	December 18, 2023
How dfs_query_then_fetch works? Elasticsearch	6	2850	February 9, 2018

Is dfs_query_then_fetch relevant for BM25/ES 5.0?

Related topics