What is the BM 25's way to disable field length norm as in TF/IDF

In ES 2.x/1.x, we can disable field length norm for a field as follows:

    "MyField" : {
      "type" : "text",
      "norms" : {
        "enabled" : false
      }
    }

Since ES 5.x use BM25 by default, is above setting still work for ES 5.2?

I am asking this question is because, per https://www.elastic.co/guide/en/elasticsearch/reference/current/norms.html

"Norms store various normalization factors that are later used at query time in order to compute the score of a document relatively to a query."

It does not mention "field length" in statement. Whereas in ES 2.x doc:
https://www.elastic.co/guide/en/elasticsearch/reference/2.0/norms.html
It clearly mentions field length in statement.

So I just want to confirm whether disable norm in ES 5.x is same as it in ES 2.x?

I found this https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-similarity.html

b: Controls to what degree document length normalizes tf values. The default value is 0.75.

I think this is what I want.

Come back to the "norms" setting approach, seems it not only disable field length norm, but also disable all other normalization factors. So if I just want to disable field length norm, I should use custom similarity with b=0.

Can any expert confirm?

The ES 5 documentation does not want to be too specific. Similarity algorithms may choose to use norms very differently, not just field length based norms. But, in ES 5 with BM25, a field length norm is used.

In ES 5 field mapping, you can disable the field norm generation by

  "norms": false

You can see this as a decision whether the field should contribute to the Similarity algorithm or not.

You are correct that BM25 not only uses field based normalization but also document based normalization.

Regarding the document length normalization, this is inherent to the BM25 scoring formula. From the original Lucene BM25 paper https://arxiv.org/abs/0911.5046

Assigning 0 to b is equivalent to avoid the process of normalisation and therefore the document length will not affect the final score. If b takes 1, we will be carrying out a full length normalisation.

The factor b can be interpreted to control the strength of short documents being pushed to the top. 0.0 means do not push at all, 1.0 means full strength push.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.