Query_string with wildcard not working as expected (or wrong understanging of analyze_wildcard)

Hi I am wondering why the following query does not hit. Here is the reproducer:

// put index
PUT /test
{
  "mappings" : {
    "properties" : {
        "title": { 
          "type": "text", 
          "analyzer": "german"        
        }
    }
  }
}

// put test doc
POST /test/_doc
{
  "title": "Foober Baren"
}

GET /_analyze 
{
  "analyzer": "german",
  "text": "Foober Baren"
}
// Tokens are "foob" and "bar" as expected

GET /test/_search
{
  "query": {
    "query_string": {
      "default_field": "title",
      "analyze_wildcard": true, 
      "query": "*oober"
    }
  }
}

If I change the inside query to *oob it does hit. I would have expected the text on the wildcard also to be analyzed now. If I check how it would be analyzed:

GET /_analyze 
{
  "analyzer": "german",
  "text": "oober"
}
// yields "oob" as token as expexted

so *oober analyzed should be *oob and also hit,... did I understand analyze_wildcard wrong?

Interesting enough the query

GET /test/_search
{
  "query": {
    "query_string": {
      "default_field": "title",
      "analyze_wildcard": true, 
      "query": "foobe*"
    }
  }
}

does hit which would sugest that foobe* is analyzed to foob* and thus hits the foob token of the document.

It seems that left wildcards only lowercase and then match and right wildcards lowercase and anaylze,... but that would be weird inconsistent behaviour between those? Can anyone confirm or deny/explain this observation?

1 Like

It would be awesome if anyone could explain why left wildcards are analyzed differently to right wildcards and if this has a reason or should rather be reported as a unexpected behavior / bug?

Thank you!

The Elasticsearch query_string query is passing the text to the analyzer for processing. However, terms with wildcards are not passed to the analyzer (leading or not). This will explain the difference and issues you are seeing - and the analyzer plays part in transforming the token to its stem form in the OP's example.

This is not quite correct. This is what the analyze_wildcard parameter is for as found in the elastic documentation (to which you also should link instead of promoting your own company with links)

I'm actually spot on :slight_smile: let me break it down:

  • During indexing the analyzer outputs foob to the index. There is no foober
  • During search, *oober is received, but the analyzer has nothing to do with it. No stemming algorithm can be executed here , as the whole logic depends on the full word structure, which is missing due to the wildcard.
  • Analyzer is skipped, query is rewritten as boolean with a leading wildcard query for this term, and no match is found.