"query_string" dosen't analyze wildcard queries

Alexander_Ott · November 29, 2017, 8:08am

Hi,

after upgrade from ES 2.4.4 to ES 5.6.1 the following query dosen't get analyzed by our analyzer

{
  "analyzer": {
    "default": {
      "type": "custom",
      "tokenizer": "fesadTokenizer",
      "char_filter": [
        "german_special",
        "strip_multiple_chars"
      ],
      "filter": [
        "asciifolding",
        "lowercase"
      ]
    }
  }

  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "query": "*chifffahrt",
            "default_field": "_all",
            "allow_leading_wildcard": true,
            "analyze_wildcard": true
          }
        }
      ]
    }
  }
}

I think the reason is https://github.com/elastic/elasticsearch/commit/14d2b122a1b91ba14fc13fbb103e553cfddd5778#diff-3d1f1c3586dc5fe67e7615030feaecec

After this change the getPossiblyAnalyzedWildcardQuery(indexedNameField, termStr) was removed and the org.apache.lucene.analysis.Analyzer#normalize(java.lang.String, java.lang.String) is used instead.

The defined char_filter's and token_filter's dosn't get called which is a problem for us cause we dosn't receive any hits.

Is this behaviour as expected in ES 5.6.1 or is it a bug or configuration error?

jpountz · November 29, 2017, 5:21pm

At least the lowercase and asciifolding filters should be applied. I can't tell about the char filters and you tokenizer since I don't know how they are implemented.

Can you share the output of the following request with both versions?

GET your_index_name/_validate/query?rewrite=true
{
  "query" : {
    "query_string": {
      "query": "*chifffahrt",
      "default_field": "_all",
      "allow_leading_wildcard": true,
      "analyze_wildcard": true
    }
  }
}

Alexander_Ott · November 30, 2017, 6:44am

The solution is that our own CharFilterFactory which extends org.elasticsearch.index.analysis.AbstractCharFilterFactory also must implement the interface org.elasticsearch.index.analysis.MultiTermAwareComponent which was not the case so far.

Cause org.apache.lucene.analysis.Analyzer#normalize(java.lang.String, java.lang.String) calls org.elasticsearch.index.analysis.CustomAnalyzer#initReaderForNormalization where only CharFilterFactory's which are instanceof MultiTermAwareComponent will be used

        @Override
        protected Reader initReaderForNormalization(String fieldName, Reader reader) {
          for (CharFilterFactory charFilter : charFilters) {
            if (charFilter instanceof MultiTermAwareComponent) {
              charFilter = (CharFilterFactory) ((MultiTermAwareComponent) charFilter).getMultiTermComponent();
              reader = charFilter.create(reader);
            }
          }
          return reader;
        }

The same applies to the TokenFilterFactory's

        @Override
        protected TokenStream normalize(String fieldName, TokenStream in) {
          TokenStream result = in;
          for (TokenFilterFactory filter : tokenFilters) {
            if (filter instanceof MultiTermAwareComponent) {
              filter = (TokenFilterFactory) ((MultiTermAwareComponent) filter).getMultiTermComponent();
              result = filter.create(result);
            }
          }
          return result;
        }

jpountz · November 30, 2017, 9:33am

This is correct. Do I get it right that it fixed your issue?

Alexander_Ott · November 30, 2017, 9:50am

Yes this fixed my issue.

Thanks

system · December 28, 2017, 9:51am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
QueryStringQuery not working after upgrade to 5.6.7 from 2.4 Elasticsearch	1	363	July 2, 2018
Highlighting issue with wildcard query string query Elasticsearch	4	2044	July 5, 2017
Wildcards with query_string query and custom analyzer Elasticsearch	2	263	May 25, 2022
"query_string" Wildcard search with special characters issue Elasticsearch	4	3316	December 2, 2020
Issue with QueryStringQuery regular expression Elasticsearch	2	337	July 26, 2018

"query_string" dosen't analyze wildcard queries

Related topics