Issues with 7.0.0 nested + script query

I opened this bug https://github.com/elastic/elasticsearch/issues/41323

The details are in there.

Basically when I use the query in that bug with the mapping there I end up i the following situation.

  1. Without nested in the query : no errors
  2. If the query uses doc['x'].value : error field does not have values (even though all the documents do have an x field set)
  3. If the query checks doc['x'].size() : error Double does not have a size function - It appears this did work and I had typod doc['x'].size as doc['x'].value.size() so this is the same as option 5 now
  4. If the query checks doc.contains('x') then it passed in every circumstance and ended up with the same error as 2
  5. doc.x.size() : worked and managed to get my query to pass

doc.x.size() wasn't mentioned in the error message and I suspect that is what should be mentioned. it's interesting that the error message says something different to the documentation (which suggests using contains).

Anyone shed any light on this. I can attempt to produce a smaller example which reliable replicates it.

I have edited to correct one of my mistakes (a typo in my panic very early in the morning).

This still doesn't explain why when all of my documents have all the fields set when using the nested query with the script it fails.

I have managed to reproduce a smaller set of instructions to replicate.

curl -X DELETE  "localhost:9200/systems_test"

echo

curl -X PUT "localhost:9200/systems_test" -H 'Content-Type: application/json' -d '{
"settings":{"index":{"number_of_shards":"1","store":{"preload":["dvd","tim"]},"analysis":{"filter":{"autocomplete_filter":{"type":"edge_ngram","min_gram":"1","max_gram":"20"}},"normalizer":{"lowercase":{"filter":["lowercase"],"type":"custom"}},"analyzer":{"autocomplete":{"filter":["lowercase","autocomplete_filter"],"type":"custom","tokenizer":"standard"}}},"number_of_replicas":"1"}}
}'

echo

curl -X PUT "localhost:9200/systems_test/_mapping" -H 'Content-Type: application/json' -d '{
    "properties":{"allegiance":{"type":"text","fields":{"keyword":{"type":"keyword","eager_global_ordinals":true}}},"bodies":{"type":"nested","properties":{"distance_to_arrival":{"type":"double"},"edsm_id":{"type":"keyword"},"estimated_mapping_value":{"type":"long"},"estimated_scan_value":{"type":"long"},"id":{"type":"keyword"},"id64":{"type":"keyword"},"is_main_star":{"type":"boolean"},"name":{"type":"text","fields":{"keyword":{"type":"keyword"},"suggest":{"type":"text","analyzer":"autocomplete","search_analyzer":"standard"}}},"subtype":{"type":"text","fields":{"keyword":{"type":"keyword"}}},"terraforming_state":{"type":"text","fields":{"keyword":{"type":"keyword"}}},"type":{"type":"text","fields":{"keyword":{"type":"keyword"}}}}},"controlling_minor_faction":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"edsm_id":{"type":"long"},"estimated_scan_value":{"type":"long"},"government":{"type":"text","fields":{"keyword":{"type":"keyword","eager_global_ordinals":true}}},"id":{"type":"keyword"},"id64":{"type":"keyword"},"minor_faction_presence":{"properties":{"influence":{"type":"float"},"name":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},"state":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}}}},"minor_faction_presences":{"type":"nested","properties":{"influence":{"type":"double"},"name":{"type":"text","fields":{"keyword":{"type":"keyword"}}},"state":{"type":"text","fields":{"keyword":{"type":"keyword"}}}}},"name":{"type":"text","fields":{"keyword":{"type":"keyword"},"lkeyword":{"type":"keyword","normalizer":"lowercase"},"suggest":{"type":"text","analyzer":"autocomplete","search_analyzer":"standard"}}},"needs_permit":{"type":"boolean"},"population":{"type":"long"},"power":{"type":"text","fields":{"keyword":{"type":"keyword","eager_global_ordinals":true}}},"power_state":{"type":"text","fields":{"keyword":{"type":"keyword","eager_global_ordinals":true}}},"primary_economy":{"type":"text","fields":{"keyword":{"type":"keyword","eager_global_ordinals":true}}},"secondary_economy":{"type":"text","fields":{"keyword":{"type":"keyword","eager_global_ordinals":true}}},"security":{"type":"text","fields":{"keyword":{"type":"keyword","eager_global_ordinals":true}}},"state":{"type":"text","fields":{"keyword":{"type":"keyword","eager_global_ordinals":true}}},"x":{"type":"double"},"y":{"type":"double"},"z":{"type":"double"}}
}'

echo
curl -X POST "localhost:9200/systems_test/_doc" -H 'Content-Type: application/json' -d '{
   "population" : 0,
   "z" : -31.84375,
   "needs_permit" : false,
   "y" : 6.5625,
   "bodies" : [
      {
         "edsm_id" : 21273560,
         "id64" : 21273560,
         "name" : "Wregoe DI-O b47-12 11",
         "id" : 21273560,
         "distance_to_arrival" : 1728,
         "is_main_star" : false,
         "estimated_scan_value" : 500,
         "estimated_mapping_value" : 1666,
         "subtype" : "Icy body",
         "type" : "Planet"
      }
   ],
   "id64" : 27067823498649,
   "edsm_id" : 8081851,
   "x" : 641.90625,
   "name" : "Wregoe DI-O b47-12"
}'

echo
curl -X POST "localhost:9200/systems_test/_doc" -H 'Content-Type: application/json' -d '{
               "edsm_id" : 8081852,
               "id64" : 60818917171114,
               "name" : "Boelts QM-Q c19-221",
               "x" : -7720,
               "y" : -983.625,
               "bodies" : [
                  {
                     "type" : "Star",
                     "estimated_scan_value" : 1214,
                     "estimated_mapping_value" : 4046,
                     "subtype" : "G (White-Yellow) Star",
                     "is_main_star" : true,
                     "name" : "Boelts QM-Q c19-221 A",
                     "id64" : 14982802,
                     "edsm_id" : 14982802,
                     "distance_to_arrival" : 0,
                     "id" : 14982802
                  }
               ],
               "z" : 16417.4375,
               "needs_permit" : false,
               "population" : 0
            }'

echo
curl -H "Content-type: application/json" -XGET 'http://localhost:9200/systems_test/_search?pretty=true&size=100' -d '
{
   "query" : {
      "bool" : {
         "must" : [
            {
               "script" : {
                  "script" : {
                     "source" : "\n                    double dx = -9530.5 - 0;\n                    double dy = -910.28125 - 0;\n                    double dz = 19808.125 - 0;\n\n                    double lengthsq = dx*dx + dy*dy + dz*dz;\n\n                    double pdx = doc.x.value - 0;\n                    double pdy = doc.y.value - 0;\n                    double pdz = doc[\u0027z\u0027].value - 0;\n\n                    double dot = pdx * dx + pdy * dy + pdz * dz;\n\nif ( dot < 0 || dot > lengthsq) {\n                        return false;\n                    }\n\n                    double dsq = (pdx*pdx + pdy*pdy + pdz*pdz) - dot*dot/lengthsq;\n                    if (dsq > 250000d) {\n                        return false;\n                    }\n\n                    return true;\n                 ",
                     "lang" : "painless"
                  }
               }
            },
            {
               "nested" : {
                  "query" : {
                     "function_score" : {
                        "script_score" : {
                           "script" : "doc[\u0027bodies.estimated_scan_value\u0027].value"
                        },
                        "score_mode" : "sum"
                     }
                  },
                  "inner_hits" : {
                     "size" : 20
                  },
                  "path" : "bodies",
                  "score_mode" : "sum"
               }
            }
         ]
      }
   }
}
'

It works if you change the query above to the one below

curl -H "Content-type: application/json" -XGET 'http://localhost:9200/systems_test/_search?pretty=true&size=100' -d '
{
   "query" : {
      "bool" : {
         "must" : [
            {
               "script" : {
                  "script" : {
                     "source" : "if (doc[\u0027x\u0027].size() == 0 || doc[\u0027y\u0027].size() == 0 || doc[\u0027z\u0027].size() == 0) {return false;}\n                    double dx = -9530.5 - 0;\n                    double dy = -910.28125 - 0;\n                    double dz = 19808.125 - 0;\n\n                    double lengthsq = dx*dx + dy*dy + dz*dz;\n\n                    double pdx = doc.x.value - 0;\n                    double pdy = doc.y.value - 0;\n                    double pdz = doc[\u0027z\u0027].value - 0;\n\n                    double dot = pdx * dx + pdy * dy + pdz * dz;\n\nif ( dot < 0 || dot > lengthsq) {\n                        return false;\n                    }\n\n                    double dsq = (pdx*pdx + pdy*pdy + pdz*pdz) - dot*dot/lengthsq;\n                    if (dsq > 250000d) {\n                        return false;\n                    }\n\n                    return true;\n                 ",
                     "lang" : "painless"
                  }
               }
            },
            {
               "nested" : {
                  "query" : {
                     "function_score" : {
                        "script_score" : {
                           "script" : "doc[\u0027bodies.estimated_scan_value\u0027].value"
                        },
                        "score_mode" : "sum"
                     }
                  },
                  "inner_hits" : {
                     "size" : 20
                  },
                  "path" : "bodies",
                  "score_mode" : "sum"
               }
            }
         ]
      }
   }
}
'

However as you can see from the data, both records have an x, y and z field

The query immediately above also works fine if you remove the nested part

curl -H "Content-type: application/json" -XGET 'http://localhost:9200/systems_test/_search?pretty=true&size=100' -d '
{
   "query" : {
      "bool" : {
         "must" : [
            {
               "script" : {
                  "script" : {
                     "source" : "\n                    double dx = -9530.5 - 0;\n                    double dy = -910.28125 - 0;\n                    double dz = 19808.125 - 0;\n\n                    double lengthsq = dx*dx + dy*dy + dz*dz;\n\n                    double pdx = doc.x.value - 0;\n                    double pdy = doc.y.value - 0;\n                    double pdz = doc[\u0027z\u0027].value - 0;\n\n                    double dot = pdx * dx + pdy * dy + pdz * dz;\n\nif ( dot < 0 || dot > lengthsq) {\n                        return false;\n                    }\n\n                    double dsq = (pdx*pdx + pdy*pdy + pdz*pdz) - dot*dot/lengthsq;\n                    if (dsq > 250000d) {\n                        return false;\n                    }\n\n                    return true;\n                 ",
                     "lang" : "painless"
                  }
               }
            }
         ]
      }
   }
}
'

Thanks I can reproduce with your example. The error appears because the script query is evaluated on a nested document. This is a side effect of two things:

  • The two clauses must evaluate all documents to get matches.
  • The root script query have a smaller cost since it will be evaluated on root document only

Under these circumstances the root script query is sometimes executed on a nested document
that will be later filtered by the other nested clause. It is kind of an edge case since it only happen on non-restrictive queries (script_score) but I agree that this is unexpected for users. If you add the protection against empty fields the query works fine so this is not really a bug but rather an implementation detail. However I agree that this behavior is not expected for users so I'll open an issue to discuss further.
Thanks for reporting

Thanks, the protection is certainly good to have even though in this case I the main document will never not have those fields. It would be good to get some insight into why it ends up being executed on the nested document since that would never match and will cost some cycles.

Is there a way I can restructure to improve the search performance by making sure that the script will only ever be executed on the main document (this may also improve some other script queries I have which all relate to distance)?

You can add a restrictive clause in the nested query or in the root query. It is usually not a best practice to use the script_query or the function_score without a real query. It is equivalent to a brute force where you need to evaluate all documents. We usually recommend to use these queries in rescorer or in conjunction with another query that restricts the result to a subset of documents.
Otherwise you'll need to add protection against all types of documents that you have in your index including nested docs. I opened https://github.com/elastic/elasticsearch/issues/41339 to discuss adding the protection automatically.

There is more to the query however I stripped it down to minimally replicate the bug.
That said, unfortunately, often the largest source of result reduction is a combination of the nested and distance filter. I'm aware of the downside of script clauses against the entire corpus but unfortunately there isn't currently a better way of doing this.

One of the reasons I was so eager to move to ES 7 was the introduction of multi dimensional indexes so I can potentially implement a reasonable 3d distance plugin which doesn't require a script clause for sorting (nearest neighbour).

By a restrictive clause do you mean something like adding an extra clause into the main query or nested query? In the full query the nested query already does have some extra range (gte) filters for the distance_to_arrival and estimated_scan_value fields. The real main query (which did have the problem as it was where I noticed it) has an extra range clause for population gte 0 as well (there are several others).

By adding protection do you just mean defensively checking if the field has values as I have currently done?

This is slightly altered as when investigating the issue last night I spotted several things which could be improved and have made those changes subsequently, but still exposes the bug on my dev server when I remove the protection.

curl -H "Content-type: application/json" -XGET 'http://localhost:9200/systems/_doc/_search?pretty=1&size=100' -d '
{
   "query" : {
      "bool" : {
         "must" : [
            {
               "bool" : {
                  "must_not" : [
                     {
                        "ids" : {
                           "values" : [
                              "iDm1KmoB5PMt6Dc9qK9b",
                              "Z3-6KmoB5PMt6Dc9zr5U"
                           ]
                        }
                     }
                  ]
               }
            },
            {
               "term" : {
                  "population" : 0
               }
            },
            {
               "nested" : {
                  "path" : "bodies",
                  "inner_hits" : {
                     "size" : 20
                  },
                  "query" : {
                     "function_score" : {
                        "score_mode" : "sum",
                        "query" : {
                           "bool" : {
                              "must_not" : [],
                              "must" : [
                                 {
                                    "range" : {
                                       "bodies.distance_to_arrival" : {
                                          "gte" : 0,
                                          "lte" : 1000000
                                       }
                                    }
                                 },
                                 {
                                    "range" : {
                                       "bodies.estimated_scan_value" : {
                                          "gte" : "100000"
                                       }
                                    }
                                 }
                              ]
                           }
                        },
                        "field_value_factor" : {
                           "missing" : 0,
                           "field" : "bodies.estimated_scan_value"
                        }
                     }
                  },
                  "score_mode" : "sum"
               }
            },
            {
               "script" : {
                  "script" : {
                     "lang" : "painless",
                     "source" : "\n                    if (doc.x.size() == 0 || doc.y.size() == 0 || doc.z.size() == 0) {\n                        return false;\n                    }\n                    double dx = params.dx - params.sx;\n                    double dy = params.dy - params.sy;\n                    double dz = params.dz - params.sz;\n\n                    double lengthsq = dx*dx + dy*dy + dz*dz;\n\n                    double pdx = doc[\u0027x\u0027].value - params.sx;\n                    double pdy = doc[\u0027y\u0027].value - params.sy;\n                    double pdz = doc[\u0027z\u0027].value - params.sy;\n\n                    double dot = pdx * dx + pdy * dy + pdz * dz;\n\n                    if ( dot < 0 || dot > lengthsq) {\n                        return false;\n                    }\n\n                    double dsq = (pdx*pdx + pdy*pdy + pdz*pdz) - dot*dot/lengthsq;\n                    if (dsq > params.radius_squared) {\n                        return false;\n                    }\n\n                    return true;\n                 ",
                     "params" : {
                        "dx" : -9530.5,
                        "radius_squared" : 625.0,
                        "sz" : 0,
                        "sy" : 0,
                        "dz" : 19808.125,
                        "dy" : -910.28125,
                        "sx" : 0
                     }
                  }
               }
            }
         ]
      }
   },
   "stored_fields" : [
      "_source"
   ]
}
'

Sorry I wasn't clear, you need to add the protection if you have nested fields even if you have another restrictive clause. The restrictive clause will lead the search but the script query will be executed to ensure that the document matches, if it doesn't we try the next document which by design can be a nested document and execute the script again until we find a matching document.

By adding protection do you just mean defensively checking if the field has values as I have currently done?

No I am talking about adding a filter that evaluates the script only if it's a root document. That's what we need to discuss in the github issue.

Is there any way I can add that myself as a temporary measure?

Nope I don't think so. The issue is related to a limitation we have in Lucene, usually we don't execute the script on documents that don't match the other clause, this is what we call two-phase queries. However in 7.0 we use a new type of boolean queries for must clauses that is not able to run queries like this. I opened https://issues.apache.org/jira/browse/LUCENE-8770 in Lucene to fix this behavior entirely, in the meantime you'll need to check if the value is present to eliminate the nested documents.

OK thanks, I wasn't intending to eliminate the check, I was just wondering if I could improve the performance by eliminating nested document checks from it with a temporary fix

For anyone else who happens to have related issues, I managed to improve the performance of the query quite substantially after some debugging and a few tests (though I still need the checks for doc values in place). It turns out that quite a few of my filters had stopped being classed as filters and were being scored sometime between upgrading from 5 to 7 (can't remember if I saw the performance drop when I went to 6). After forcing them into "filter" context the time taken on the queries dropped substantially.

I also put a pre-filter for a bounding box on the distance query by simply doing a min/max range on the x, y and z fields, then filtering down more exactly with the script.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.