Issue with ES 6.8 supporting _all field within index created in ES 5.6

Hello,

In production, we use Elasticsearch to index our data for search related purposes. To interact with our current ES 5.6 cluster, we use the Java transport client. We are in the process of preparing for an upgrade from ES 5.6 to ES 6.8. We plan on switching to the Java high level REST client as part of this process.

According to the ES reference manual, each major version of ES should be able to read and support indices created within its same major version or one prior. So, if I understand this correctly, a cluster of nodes running ES 6.8 should be able to read and support indices created in any minor version of ES 6.x as well as any minor version of ES 5.x. If this is true, then ES 6.8 should have no problem reading and supporting our indices created with ES 5.6.

So far, this has been true except for when it comes to searches against the _all field. We understand that the _all field is deprecated and that we will be unable to create new indices in ES 6.x with this field enabled or referenced in any form. However, if we simply remove our usage of the _all field in one fell swoop and switch over to our own special catch all field that we write to through the use of the new copy_to parameter, then this means that all users would be forced to reindex before searches that utilize the new catch all field are successful. If we do not enforce a rebuild in this case, then users would be making searches against an old index (one created in 5.6) that does not yet have knowledge of the new catch all field.

For example, let's assume that we have an index built in ES 5.6 that has the _all field enabled and used. Searches made against this index that specify usage of the _all field will be successful. Now, let's assume that we upgrade to ES 6.8 and release a new version of our code where searches are made against the index specifying the usage of our new catch all field. Well, until we reindex the index built during 5.6 so that it has knowledge of the new catch all field, these searches will fail.

As a result, what we would like to do is have a transition period where we temporarily still search against the _all field in ES 6.8 until all indexes are rebuilt without the _all field, thus eliminating the need for it. However, during our testing of executing searches against ES 6.8 that specify the _all field as the target field for querying against, we are seeing inconsistent behavior when compared to how the same searches worked in ES 5.6.

These issues are specifically around usages of the wildcard operator, *, within the query strings that are targeting the _all field.

For example, the following query string search payload against the _all field for an index in ES 5.6 expectedly gives us all documents that have any string that ends with "super" that is part of the _all field:

GET index/_search
{
    "query": {
        "query_string" : {
            "query" : "*super",
            "default_field" : "_all",
            "fields" : [ ],
            "use_dis_max" : true,
            "tie_breaker" : 0.0,
            "default_operator" : "or",
            "auto_generate_phrase_queries" : false,
            "max_determinized_states" : 10000,
            "enable_position_increments" : true,
            "fuzziness" : "AUTO",
            "fuzzy_prefix_length" : 0,
            "fuzzy_max_expansions" : 50,
            "phrase_slop" : 0,
            "escape" : false,
            "split_on_whitespace" : true,
            "boost" : 1.0
        }
    }
}

While the same query string search payload (adjusted for unrelated deprecated parameters) against the _all field for the same index that is now running in ES 6.8 instead gives us every single document in the index as if it makes no attempt to see if super is in the _all field:

GET index/_search
{
    "query": {
        "query_string" : {
            "query" : "*super",
            "default_field" : "_all",
            "fields" : [ ],
            "type" : "best_fields",
            "default_operator" : "or",
            "max_determinized_states" : 10000,
            "enable_position_increments" : true,
            "fuzziness" : "AUTO",
            "fuzzy_prefix_length" : 0,
            "fuzzy_max_expansions" : 50,
            "phrase_slop" : 0,
            "escape" : false,
            "auto_generate_synonyms_phrase_query" : true,
            "fuzzy_transpositions" : true,
            "boost" : 1.0
        }
    }
}

It's almost as if the query string "*super" is now being tokenized at some point to be separated into "*" and "super", thus giving back every single document in the index rather than only those that have a string that in some way ends with super in the _all field.

So, my question is, what has changed to cause this behavior when searching against the _all field? Are wildcard operators now treated/tokenized differently? Is this a known bug?

This is an issue whether or not the _all field is specified as the default_field or in the "fields":[] collection. Interestingly, putting the * operator at the end does not result in the same behavior. It's as if the * is ignored in this case.

Additionally, the wildcard operator, *, when used in queries against all other fields besides the _all field seems to work as expected. It is only when we have a * in the query string for a query against the _all field that we start to get unexpected results. i.e. every document in the index

Thank you for your time and any help

Evan

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.