Script_score with field that does not exist filters out all documents


(Jillesvangurp) #1

I ran into an issue where script_score was dropping documents from my results. In the end I narrowed it down to the field not yet existing in my mapping and I solved it by explictly adding the priority field to my mapping.

The query below returns no documents at all if I have only documents without the priority field.

{
  "query": {
    "filtered": {
      "query": {
        "function_score": {
          "query": {
            "match_all": {}
          },
          "score_mode": "max",
          "functions": [
            {
              "script_score": {
                "script": "doc['priority'].value;"
              }
            }
          ]
        }
      },
      "filter": {
        "and": [
          {
            "ids": {
              "values": [
                "A",
                "B",
                "C"
              ]
            }
          }
        ]
      }
    }
  }
}

In any case, this behavior seems weird. I would expect script_score to never drop documents and merely do something to the score. Instead, it seems to actually drop all documents because the field does not exist in the mapping. As soon as the field mapping exists, the query works as expected.

I was wondering whether to report this as a bug or whether there is some good reason for this somewhat counter intuitive behavior.


(Colin Goodheart-Smithe) #2

This is strange if the presence of a mapping cause this difference in behaviour. I would raise a bug on the Elasticsearch Github repo, preferably with a small cURL/sense recreation and a link to this topic.

Another way you could work around this is to add a small non-zero number to your script (e.g. doc['priority'].value + 0.0001;) or as a separate boost function so your final score is alway non-zero and documents never get dropped.

Also, you could use the FieldValueFactor function here to avoid the use of scripts, it should perform a little better since generally, scripts are slower than native code.

Hope that helps


(Jillesvangurp) #3

Thanks, I figured out what is going on:

{
   "took": 17,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 4,
      "failed": 1,
      "failures": [
         {
            "index": "yay",
            "shard": 4,
            "status": 500,
            "reason": "QueryPhaseExecutionException[[yay][4]: query[filtered(filtered(function score (ConstantScore(*:*),function=script[doc['priority'].value;], params [null]))->+_uid:boo#A _uid:boo#B _uid:boo#C)->cache(_type:boo)],from[0],size[10]: Query Failed [Failed to execute main query]]; nested: GroovyScriptExecutionException[ElasticsearchIllegalArgumentException[No field found for [priority] in mapping with types [boo]]]; "
         }
      ]
   },
   "hits": {
      "total": 0,
      "max_score": null,
      "hits": []
   }
}

It returned with a 200 status code when in fact it failed on the field on every document. Since it does that on all documents, no results are returned. I'm guessing they sort of opportunistically run the script query on each hit hoping that the next one might succeed.


(system) #4