Are the queryNorm and coord factors no longer applied to bool query scoring?


(David Kemp) #1

Is it supposed to be the case that, as of version 5, the queryNorm and coord factors are no longer applied to the bool query scores?

In version 2, a simple bool query containing an array of two "should" sub-queries, scoring would multiply the sub-query scores by the fraction of the sub-queries that matched (i.e. apply a coord factor), and also apply some mysterious queryNorm factor. You could disable the coord factor with a flag.

In version 5, the score is simply the sum of the scores of the sub-queries that match.

Was this a deliberate change? I have tried it with versions 5.2 and 5.3.

POST /foo/foo/_bulk
{"index": {}}
{"a": "apple"}
{"index": {}}
{"a": "pear"}
{"index": {}}
{"a": "apple pear"}

GET /foo/foo/_search?explain=true
{
  "query": {
    "bool": {
      "should": [
        {
          "constant_score": {
            "filter": {
              "match": {
                "a": "apple"
              }
            },
            "boost": 3
          }
        },
                {
          "constant_score": {
            "filter": {
              "match": {
                "a": "pear"
              }
            },
            "boost": 4
          }
        }
      ]
    }
  }
}

For the above query, I get scores of 7, 4 and 3 for "apple pear", "pear", and "apple" respectively. On version 2, I get scores of 1.4, 0.4, and 0.3.


(Ivan Brusic) #2

QueryNorm and coords will be removed in Lucene 7:
https://issues.apache.org/jira/browse/LUCENE-7347

What you are seeing is probably the effect of BM25 now being the default
similarity in Lucene 6 (and therefore Elasticsearch 5.x). That said, I am
surprised that coords are not in effect since Lucene does not implement
BM25 to the letter. Try changing it back to the classic TDIDF similarity.


(David Kemp) #3

You are right. If I configure the "similarity" algorithm to be the old (classic) TF/IDF, then the coord and query-norm factors are applied. This is hinted at towards the end of the section on the Similarity module where it mentions something about query-norm and coord. I think the documentation could be more clear, but I feel I still don't really know enough about how the similarity module works to suggest improvements. Anyway, thanks for the tip Ivan.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.