Why does IDF differs on hits with same query?


I'm trying to understand how scoring is affecting my search results - but seems to misunderstand the IDF term.

When quering a single index and one single type of documents the values used for calculating the IDF is not equal on all results.

In my understanding it should be equal as the IDF is calculated by how many of the overall documents the search terms appears inside.

"description": "weight(file_content:manual in 0) [PerFieldSimilarity], result of:",
"details": [
    "value": 0.055544302,
    "description": "score(doc=0,freq=1.0), product of:",
    "details": [
        "value": 0.5771883,
        "description": "queryWeight, product of:",
        "details": [
            "value": 3.0794415,
            "description": "idf(docFreq=1, maxDocs=16)",
            "details": []
            "value": 0.18743278,
            "description": "queryNorm",
            "details": []

The docFreq and maxDocs is ex. 1 and 8 in the next hit even though it is the same search term and the same index.

How is that possible?

Maybe that hit comes from a different shard? Have you tried setting search_type parameter to dfs_query_then_fetch? This would compute the distributed docFreq so when all shards work with the same docFreq when computing the score.

OK - here comes the answer...

The maxDocs is only referering to documents in the same shard as the hit came from.

This has a significant impact on the final score in my test case where I only have a few documents indexed in my cluster and doing a multimatch with two search terms.

The most relevant document where both search terms is represented in the searched field, is only presented as hit number 3, even though all logic says that it should be numer 1. Hit number 1 and 2 does only include the first search term.

This happens because the documents is indexed on 3 different shards and one of the search terms only appears in 1 document on each shard and at the same time the number of documents on each shard differs from 9 to 31.

This of course gives a difference in the overall scoring.

Shouldn't the maxDocs in the optimal setup be calculated at cluster level??

Ah, sorry your post came before mine :slight_smile:

Great - I'll try that.


This blog https://www.elastic.co/blog/understanding-query-then-fetch-vs-dfs-query-then-fetch gives a good understanding of why this situation happens - and all in all it concludes that this most probably wont occur if the cluster contains enough data.

In developments phases some might not have more than 100 documents in their cluster so the learning must be to have closer to the expected number of documents in the production environment (or at least "many" documents on each shard) before trying to tweak the search results.