"query_string" dosen't analyze wildcard queries

Hi,

after upgrade from ES 2.4.4 to ES 5.6.1 the following query dosen't get analyzed by our analyzer

{
  "analyzer": {
    "default": {
      "type": "custom",
      "tokenizer": "fesadTokenizer",
      "char_filter": [
        "german_special",
        "strip_multiple_chars"
      ],
      "filter": [
        "asciifolding",
        "lowercase"
      ]
    }
  }

  "query": {
    "bool": {
      "must": [
        {
          "query_string": {
            "query": "*chifffahrt",
            "default_field": "_all",
            "allow_leading_wildcard": true,
            "analyze_wildcard": true
          }
        }
      ]
    }
  }
}

I think the reason is https://github.com/elastic/elasticsearch/commit/14d2b122a1b91ba14fc13fbb103e553cfddd5778#diff-3d1f1c3586dc5fe67e7615030feaecec

After this change the getPossiblyAnalyzedWildcardQuery(indexedNameField, termStr) was removed and the org.apache.lucene.analysis.Analyzer#normalize(java.lang.String, java.lang.String) is used instead.

The defined char_filter's and token_filter's dosn't get called which is a problem for us cause we dosn't receive any hits.

Is this behaviour as expected in ES 5.6.1 or is it a bug or configuration error?

At least the lowercase and asciifolding filters should be applied. I can't tell about the char filters and you tokenizer since I don't know how they are implemented.

Can you share the output of the following request with both versions?

GET your_index_name/_validate/query?rewrite=true
{
  "query" : {
    "query_string": {
      "query": "*chifffahrt",
      "default_field": "_all",
      "allow_leading_wildcard": true,
      "analyze_wildcard": true
    }
  }
}

The solution is that our own CharFilterFactory which extends org.elasticsearch.index.analysis.AbstractCharFilterFactory also must implement the interface org.elasticsearch.index.analysis.MultiTermAwareComponent which was not the case so far.

Cause org.apache.lucene.analysis.Analyzer#normalize(java.lang.String, java.lang.String) calls org.elasticsearch.index.analysis.CustomAnalyzer#initReaderForNormalization where only CharFilterFactory's which are instanceof MultiTermAwareComponent will be used

        @Override
        protected Reader initReaderForNormalization(String fieldName, Reader reader) {
          for (CharFilterFactory charFilter : charFilters) {
            if (charFilter instanceof MultiTermAwareComponent) {
              charFilter = (CharFilterFactory) ((MultiTermAwareComponent) charFilter).getMultiTermComponent();
              reader = charFilter.create(reader);
            }
          }
          return reader;
        }

The same applies to the TokenFilterFactory's

        @Override
        protected TokenStream normalize(String fieldName, TokenStream in) {
          TokenStream result = in;
          for (TokenFilterFactory filter : tokenFilters) {
            if (filter instanceof MultiTermAwareComponent) {
              filter = (TokenFilterFactory) ((MultiTermAwareComponent) filter).getMultiTermComponent();
              result = filter.create(result);
            }
          }
          return result;
        }

This is correct. Do I get it right that it fixed your issue?

Yes this fixed my issue.

Thanks

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.