Are the queryNorm and coord factors no longer applied to bool query scoring?

Is it supposed to be the case that, as of version 5, the queryNorm and coord factors are no longer applied to the bool query scores?

In version 2, a simple bool query containing an array of two "should" sub-queries, scoring would multiply the sub-query scores by the fraction of the sub-queries that matched (i.e. apply a coord factor), and also apply some mysterious queryNorm factor. You could disable the coord factor with a flag.

In version 5, the score is simply the sum of the scores of the sub-queries that match.

Was this a deliberate change? I have tried it with versions 5.2 and 5.3.

POST /foo/foo/_bulk
{"index": {}}
{"a": "apple"}
{"index": {}}
{"a": "pear"}
{"index": {}}
{"a": "apple pear"}

GET /foo/foo/_search?explain=true
{
  "query": {
    "bool": {
      "should": [
        {
          "constant_score": {
            "filter": {
              "match": {
                "a": "apple"
              }
            },
            "boost": 3
          }
        },
                {
          "constant_score": {
            "filter": {
              "match": {
                "a": "pear"
              }
            },
            "boost": 4
          }
        }
      ]
    }
  }
}

For the above query, I get scores of 7, 4 and 3 for "apple pear", "pear", and "apple" respectively. On version 2, I get scores of 1.4, 0.4, and 0.3.

QueryNorm and coords will be removed in Lucene 7:
https://issues.apache.org/jira/browse/LUCENE-7347

What you are seeing is probably the effect of BM25 now being the default
similarity in Lucene 6 (and therefore Elasticsearch 5.x). That said, I am
surprised that coords are not in effect since Lucene does not implement
BM25 to the letter. Try changing it back to the classic TDIDF similarity.

1 Like

You are right. If I configure the "similarity" algorithm to be the old (classic) TF/IDF, then the coord and query-norm factors are applied. This is hinted at towards the end of the section on the Similarity module where it mentions something about query-norm and coord. I think the documentation could be more clear, but I feel I still don't really know enough about how the similarity module works to suggest improvements. Anyway, thanks for the tip Ivan.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.