Hi,
I am migrating from 5.6.x to 7.5.1.
Testing existing application use of query_string
queries (either via DSL or implicitly in URI search) is breaking.
I believes this is because of changes to how these queries are planned, as a result of e.g. the deprecation and removal of the _all
field.
I hope to clarify my understanding of what changed and what my options are.
Outside of my control, I have a production index with (too) many fields, in the many-thousands.
AFAI can tell as of 7.5.1, a trivial query_string
string which contains a key-value pair like "firstname:terry" does not result in "automatic detection" that a single field should be searched against, firstname
.
Instead, I get the "field expansion matches too many fields" error. I understand this is because the the default_field
setting "defaults to the index.query.default_field
index setting, which has a default value of *
"(https://www.elastic.co/guide/en/elasticsearch/reference/7.5/query-dsl-query-string-query.html#query-string-top-level-params).
In my case I understand (I think) that as a result the query planner is throwing an error instead of quietly producing "many thousands" of OR'd clauses.
What I don't understand in a nutshell is, why is the appropriate default_field
scope in a trivial query like my example simple key-value pair query string, not being extracted automatically?
Should it be? Can it be?
That is:
• is this "failure" to detect that only a single field is needed in this trivial case a bug, or expected?
• if it's expected, did query_string
parsing ever extract the single field? Or have queries always by default in the past been run against the _all
field, and it just "quietly worked" (if inefficiently) in the past?
• is there any mechanism today to provoke automatic detection of "relevant fields" from a query, when e.g. it only contains structured queries like key-value pairs?
AFAI can tell, in my case, the only way to successfully and efficiently handle existing legacy queries, will be to insert a step to formally parse the queries to identify all field explicitly referenced (adding those which any unstructured "organic" fields should be searched against, in our application).
Is that indeed the case?
(Fwiw I have been looking at the source and have not yet found evidence that the TreeMap fieldsAndWeights
is ever (or can ever be) populated "automagically" as a result of query string parsing, but I hope I am wrong...)