No detection of fields in query_string query strings results in "field expansion matches too many fields"

Hi,

I am migrating from 5.6.x to 7.5.1.

Testing existing application use of query_string queries (either via DSL or implicitly in URI search) is breaking.

I believes this is because of changes to how these queries are planned, as a result of e.g. the deprecation and removal of the _all field.

I hope to clarify my understanding of what changed and what my options are.

Outside of my control, I have a production index with (too) many fields, in the many-thousands.

AFAI can tell as of 7.5.1, a trivial query_string string which contains a key-value pair like "firstname:terry" does not result in "automatic detection" that a single field should be searched against, firstname.

Instead, I get the "field expansion matches too many fields" error. I understand this is because the the default_field setting "defaults to the index.query.default_field index setting, which has a default value of * "(https://www.elastic.co/guide/en/elasticsearch/reference/7.5/query-dsl-query-string-query.html#query-string-top-level-params).

In my case I understand (I think) that as a result the query planner is throwing an error instead of quietly producing "many thousands" of OR'd clauses.

What I don't understand in a nutshell is, why is the appropriate default_field scope in a trivial query like my example simple key-value pair query string, not being extracted automatically?

Should it be? Can it be?

That is:
• is this "failure" to detect that only a single field is needed in this trivial case a bug, or expected?
• if it's expected, did query_string parsing ever extract the single field? Or have queries always by default in the past been run against the _all field, and it just "quietly worked" (if inefficiently) in the past?
• is there any mechanism today to provoke automatic detection of "relevant fields" from a query, when e.g. it only contains structured queries like key-value pairs?

AFAI can tell, in my case, the only way to successfully and efficiently handle existing legacy queries, will be to insert a step to formally parse the queries to identify all field explicitly referenced (adding those which any unstructured "organic" fields should be searched against, in our application).

Is that indeed the case?

(Fwiw I have been looking at the source and have not yet found evidence that the TreeMap fieldsAndWeights is ever (or can ever be) populated "automagically" as a result of query string parsing, but I hope I am wrong...)

UPDATE

In testing I have made a critical discovery. It appears that the field-expansion error occurring when there is a large index field count, is incorrectly being thrown even when the final planned (rewritten) query never makes use of those bool clauses.

In fact AFAI can tell in my example case they are pruned.

This works without throwing the error:

{
  "query": {
    "query_string": {
      "query": "firstname:terry",
      "fields": ["lastname"]
    }
  }
}

as does the URI version, _search?q=firstname:terry&df=lastname

It seems almost certain that what is happening is that the bool clauses which might be needed to handle queries not detectably rewritable as e.g. term queries for specific fields, are being built (or not being built and an exception raised) , prior to any analysis of the actual query string.

QED so long as there is no need to actually query against a set of specific, or default, fields, this setting is effectively a no-op and a dummy value can be used(!).

In my case, our application code never actually searched against _all, we had an application-land equivalent with a subset of fields. So if this was c. ESv5 defaulting to _all this was essentially invisible. But because it was a single speculative clause it was almost no cost.

It's only when the background implementation changed to * that this behavior came into the foreground.

I would myself call this a bug, the exception should really only be raised if it's actually relevant.

It seems the best workaround will be to change index.query.default_field to an arbitrary value.

That appears to work in my case, thank heavens...!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.