Elasticsearch does not score all matched docs?

I have 500k docs in the index.

The query I run is (with profile set to true):

{
    "query": {
        "bool": {
            "should": [
            	{
            		"match" : {
		            	"name" : "Foo Bar"
		            }
            	}
            ]
        }
    },
    "size": 10,
    "sort": {
    	"_score": "desc"
    },
    "profile": true
}

And the result is:

"score_count": 10546

Does it mean that ES computed score only for 10546 documents?
The matched docs count is +- 450 000. Does it mean that ES has some optimizations that in some unknown way it does not compute the score for all docs, because it knows that these scores will be lowered than the score of 10000th document ?

I thought it should work like: compute score for all docs that match the query then sort them?

What is more, if the provided query is executed with ASC order of _score then the profile API returns score_count with +-450k which means that score has been computed for every document that match the query.

Here is the response with flag profile set to true:

{
  "profile": {
    "shards": [
      {
        "id": "[Uav4XQTFQAwrUhIowzDmNw][foobar][0]",
        "searches": [
          {
            "query": [
              {
                "type": "BooleanQuery",
                "description": "name:foo name:bar",
                "time_in_nanos": 4189986,
                "breakdown": {
                  "set_min_competitive_score_count": 10,
                  "match_count": 10550,
                  "shallow_advance_count": 0,
                  "set_min_competitive_score": 2976,
                  "next_doc": 1877718,
                  "match": 397822,
                  "next_doc_count": 10550,
                  "score_count": 10546,
                  "compute_max_score_count": 0,
                  "compute_max_score": 0,
                  "advance": 33468,
                  "advance_count": 5,
                  "score": 1572858,
                  "build_scorer_count": 10,
                  "create_weight": 121295,
                  "shallow_advance": 0,
                  "create_weight_count": 1,
                  "build_scorer": 152177
                },
                "children": [
                  {
                    "type": "TermQuery",
                    "description": "name:foo",
                    "time_in_nanos": 2027886,
                    "breakdown": {
                      "set_min_competitive_score_count": 0,
                      "match_count": 0,
                      "shallow_advance_count": 105,
                      "set_min_competitive_score": 0,
                      "next_doc": 0,
                      "match": 0,
                      "next_doc_count": 0,
                      "score_count": 10317,
                      "compute_max_score_count": 105,
                      "compute_max_score": 35785,
                      "advance": 938612,
                      "advance_count": 10343,
                      "score": 892043,
                      "build_scorer_count": 15,
                      "create_weight": 65054,
                      "shallow_advance": 19583,
                      "create_weight_count": 1,
                      "build_scorer": 55923
                    }
                  },
                  {
                    "type": "TermQuery",
                    "description": "name:bar",
                    "time_in_nanos": 358254,
                    "breakdown": {
                      "set_min_competitive_score_count": 10,
                      "match_count": 0,
                      "shallow_advance_count": 108,
                      "set_min_competitive_score": 934,
                      "next_doc": 0,
                      "match": 0,
                      "next_doc_count": 0,
                      "score_count": 1639,
                      "compute_max_score_count": 105,
                      "compute_max_score": 8923,
                      "advance": 154847,
                      "advance_count": 1647,
                      "score": 137159,
                      "build_scorer_count": 15,
                      "create_weight": 28964,
                      "shallow_advance": 10313,
                      "create_weight_count": 1,
                      "build_scorer": 13589
                    }
                  }
                ]
              }
            ],
            "rewrite_time": 11096,
            "collector": [
              {
                "name": "CancellableCollector",
                "reason": "search_cancelled",
                "time_in_nanos": 3086399,
                "children": [
                  {
                    "name": "SimpleTopScoreDocCollector",
                    "reason": "search_top_hits",
                    "time_in_nanos": 2273851
                  }
                ]
              }
            ]
          }
        ],
        "aggregations": []
      }
    ]
  }
}

And the response with flag track_total_hits set to true

{
  "profile": {
    "shards": [
      {
        "id": "[Uav4XQTFQAwrUhIowzDmNw][foobar][0]",
        "searches": [
          {
            "query": [
              {
                "type": "BooleanQuery",
                "description": "name:foo name:bar",
                "time_in_nanos": 139698953,
                "breakdown": {
                  "set_min_competitive_score_count": 0,
                  "match_count": 0,
                  "shallow_advance_count": 0,
                  "set_min_competitive_score": 0,
                  "next_doc": 63775687,
                  "match": 0,
                  "next_doc_count": 456130,
                  "score_count": 456130,
                  "compute_max_score_count": 0,
                  "compute_max_score": 0,
                  "advance": 1527064,
                  "advance_count": 5,
                  "score": 70029529,
                  "build_scorer_count": 10,
                  "create_weight": 3148307,
                  "shallow_advance": 0,
                  "create_weight_count": 1,
                  "build_scorer": 306090
                },
                "children": [
                  {
                    "type": "TermQuery",
                    "description": "name:foo",
                    "time_in_nanos": 71838139,
                    "breakdown": {
                      "set_min_competitive_score_count": 0,
                      "match_count": 0,
                      "shallow_advance_count": 0,
                      "set_min_competitive_score": 0,
                      "next_doc": 26692491,
                      "match": 0,
                      "next_doc_count": 455898,
                      "score_count": 455898,
                      "compute_max_score_count": 0,
                      "compute_max_score": 0,
                      "advance": 9269,
                      "advance_count": 5,
                      "score": 44081855,
                      "build_scorer_count": 15,
                      "create_weight": 46023,
                      "shallow_advance": 0,
                      "create_weight_count": 1,
                      "build_scorer": 96684
                    }
                  },
                  {
                    "type": "TermQuery",
                    "description": "name:bar",
                    "time_in_nanos": 4989481,
                    "breakdown": {
                      "set_min_competitive_score_count": 0,
                      "match_count": 0,
                      "shallow_advance_count": 0,
                      "set_min_competitive_score": 0,
                      "next_doc": 133219,
                      "match": 0,
                      "next_doc_count": 2409,
                      "score_count": 2409,
                      "compute_max_score_count": 0,
                      "compute_max_score": 0,
                      "advance": 1512687,
                      "advance_count": 5,
                      "score": 247728,
                      "build_scorer_count": 15,
                      "create_weight": 3075122,
                      "shallow_advance": 0,
                      "create_weight_count": 1,
                      "build_scorer": 15886
                    }
                  }
                ]
              }
            ],
            "rewrite_time": 8333,
            "collector": [
              {
                "name": "CancellableCollector",
                "reason": "search_cancelled",
                "time_in_nanos": 122650720,
                "children": [
                  {
                    "name": "SimpleTopScoreDocCollector",
                    "reason": "search_top_hits",
                    "time_in_nanos": 85187889
                  }
                ]
              }
            ]
          }
        ],
        "aggregations": []
      }
    ]
  }
}

PS.
Version 7.4

Ping...

Can Anyone answer my questions?

Yes, that is an optimisation at Lucene level. You can find more information in this blog:

2 Likes

Thank you for the answer.

The goal I am trying to achieve is the score which is independed of:

  • number of documents containing term
  • total number of documents with field
  • average length of field

So If I have two the same indices with other count of documents, the score for eg. doc-1 will be the same for both indices.

My initial approach was "scripted similarity" (LINK).

The script was quiet simple:

`return doc.freq;`

But it turns out that performance is not as good as BM25 algorithm, because my scripted score need to be computed for every document. However as you mentioned, the default BM25 algorithm has optimizations thanks to which it does not compute score for each document.

Do you have any suggestions how to achieve good performance and "same score" on other indices?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.