Inconsistent search scoring between shards

Pyppe · January 10, 2019, 9:18am

We have reindexed one big index [shards=1, size=220GB, documents=110M] to multiple nodes (9 shards & 3 nodes) using Elasticsearch 6.5.4 with java 1.8.0_191

However, now we are seeing that when doing a search the scoring between shards is not consistent (not 100% sure, but it might only be when shards are on different nodes). Identical documents (given the search criteria) can have quite different scores. Is this really expected behavior? How can we have consistent results?

Example query:

{
  "query": {
    "bool": {
      "must": [
        {
          "bool": {
            "should": [
              {
                "match_phrase": {
                  "searchableName.exact": {
                    "query": "example",
                    "boost": 3.0
                  }
                }
              },
              {
                "match_phrase": {
                  "searchableName": {
                    "query": "example",
                    "boost": 2.0
                  }
                }
              },
              {
                "prefix": {
                  "searchableName": {
                    "value": "example",
                    "boost": 1.45
                  }
                }
              },
              {
                "prefix": {
                  "searchableName.lower_latin": {
                    "value": "example",
                    "boost": 1.45
                  }
                }
              },
              {
                "prefix": {
                  "searchableName.reverse_lower_latin": {
                    "value": "elpmaxe",
                    "boost": 1.0
                  }
                }
              }
            ]
          }
        },
        {
          "bool": {
            "should": [
              {
                "term": {
                  "status": {
                    "value": "Valid",
                    "boost": 0.9
                  }
                }
              },
              {
                "term": {
                  "status": {
                    "value": "Pending",
                    "boost": 0.09
                  }
                }
              },
              {
                "term": {
                  "status": {
                    "value": "GracePeriod",
                    "boost": 0.009
                  }
                }
              },
              {
                "term": {
                  "status": {
                    "value": "Expired",
                    "boost": 0.0009
                  }
                }
              },
              {
                "term": {
                  "status": {
                    "value": "Invalid",
                    "boost": 0.00009
                  }
                }
              },
              {
                "term": {
                  "status": {
                    "value": "Unknown",
                    "boost": 0.000009
                  }
                }
              }
            ]
          }
        }
      ]
    }
  }
}

Example of two hits with different scores: https://pastebin.com/0TU4ngPc

Pyppe · January 10, 2019, 9:51am

Reading the documentation (https://www.elastic.co/guide/en/elasticsearch/guide/current/scoring-theory.html) it seems that this might kind of be the expected behavior (due to TF/IDF). And probably ignoring them (https://www.elastic.co/guide/en/elasticsearch/guide/current/ignoring-tfidf.html) might be the thing to do here in our case.

Mark_Harwood · January 10, 2019, 10:18am

See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-search-type.html#dfs-query-then-fetch

Pyppe · January 10, 2019, 1:22pm

Hmm. Setting it

{
  "search_type": "dfs_query_then_fetch",
  "query": {
    "match_all": {}
  }
}

gives me an error:

{
  "error": {
    "root_cause": [
      {
        "type": "parsing_exception",
        "reason": "Unknown key for a VALUE_STRING in [search_type].",
        "line": 2,
        "col": 18
      }
    ],
    "type": "parsing_exception",
    "reason": "Unknown key for a VALUE_STRING in [search_type].",
    "line": 2,
    "col": 18
  },
  "status": 400
}

Mark_Harwood · January 10, 2019, 1:25pm

It's a setting passed as a URL parameter to the REST api as opposed to a field in the JSON body

Pyppe · January 10, 2019, 3:47pm

Thanks so much, that did the trick!

PS. I find it a bit confusing that there are three exceptions (search_type , request_cache & allow_partial_search_results) that are not allowed to be part of the body, but instead only query-string. I'd argue you should be able to use only the body when defining the search criteria. E.g. testability would be easier if there were single JSON-structure that would define the criteria as a whole.

Exceptions are bad, mmkay?

Mark_Harwood · January 10, 2019, 3:53pm

I'm not 100% clear on the rationale myself but a part of it may be that the software channeling search requests (e.g. Kibana) can introduce application-wide policies like search timeouts by setting these headers in a single place and not tampering with the contents of the request bodies which can vary from request to request and come from many different parts of the application.

Pyppe · January 10, 2019, 4:00pm

That's a valid argument, I guess. But why not then allow them in both query-string and body? (And next we could argue which would then take the precedence if both exist... )

But anyway, thanks again for letting me know about the search_type parameter.

system · February 7, 2019, 4:00pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Inconsistent scoring between nodes Elasticsearch	2	408	July 6, 2017
Differnt shards giving different results Elasticsearch	7	1896	July 29, 2019
Odd scoring behavior Elasticsearch	7	500	March 22, 2018
Inconsistent results for the same query on an index with 0 replicas Elasticsearch	7	810	February 8, 2021
inconsistent document scores using search_type=dfs_query_then_fetch (how do the _score and _explanation.value fields relate?) Elasticsearch	8	826	December 16, 2011

Inconsistent search scoring between shards

Related topics