'query_string' does not return records when using wildcard since 8.9 (wilcard query does)

Hi,

We are testing server upgrade to 8.12.2 and we found that some of our queries stopped to works. We use 'query_string' with wildcard to search for user requested data. I have prepared some test data to show the challange we meet.

Lets assume we have index definition like this:

PUT test
{
  "settings": {
    "index": {
      "number_of_shards": "1",
      "number_of_replicas": "1",
      "analysis": {
        "filter": {
          "appendZeros": {
            "type": "pattern_replace",
            "pattern": "^(a?)(\\d+)(\\D{3})?$",
            "replacement": "$1$2\u006100$3 $1$2$3"
          },
          "appendZero": {
            "type": "pattern_replace",
            "pattern": "^(a?)(\\d+a\\d)(\\D{3})?$",
            "replacement": "$1$20$3 $1$2$3"
          },
          "divideToken": {
            "type": "pattern_capture",
            "preserve_original": false,
            "patterns": [
              "(\\S+) (\\S+)"
            ]
          }
        },
        "analyzer": {
          "currencyAnalyzer": {
            "type": "custom",
            "tokenizer": "keyword",
            "char_filter": [
              "replaceSpecialCharacters"
            ],
            "filter": [
              "lowercase",
              "appendZeros",
              "appendZero",
              "divideToken"
            ]
          },
          "textAnalyzer": {
            "type": "custom",
            "tokenizer": "standard",
            "char_filter": [
              "replaceSpecialCharacters"
            ],
            "filter": [
              "lowercase"
            ]
          }
        },
        "char_filter": {
          "replaceSpecialCharacters": {
            "type": "mapping",
            "mappings": [
              ".=>\u0061",
              ",=>\u0061"
            ]
          }
        },
        "normalizer": {
          "lowercaseNormalizer": {
            "type": "custom",
            "filter": [
              "lowercase"
            ]
          }
        }
      }
    }
  },
  "mappings": {
    "dynamic": false,
    "properties": {
      "amount": {
        "properties": {
          "amount": {
            "type": "scaled_float",
            "scaling_factor": 100,
            "copy_to": "amountValue"
          },
          "currency": {
            "type": "keyword",
            "copy_to": "amountValue"
          }
        }
      },
      "amountValue": {
        "type": "text",
        "analyzer": "currencyAnalyzer",
        "store": false
      }
    }
  }
}

Lets put in some data:

put test/_doc/1
{
  "amount":{
    "amount":123.45,
    "currency": "USD"
  }
}

put test/_doc/2
{
  "amount":{
    "amount":123.4,
    "currency": "EUR"
  }
}

Now query the data using wildcard:

post test/_search
{
  "query":{
    "wildcard": {
      "amountValue":{
        "value": "123a4*"
      }
    }
  }
}

As a result we got: "hits": { "total": {"value": 2, .... }}

Now lets run query_string:

post test/_search
{
  "query":{
    "query_string": {
      "fields": ["amountValue"], 
      "query": "123a4*"
    }
  }
}

This time we got: "hits": {"total": {"value": 0, ...}}

We tested earlier versions of Elasticserver and this behaviour started to appear since 8.9.

Now lets check tokens from documents we inserted:

post test/_analyze
{
  "text":"123,4 123.45",
  "field": "amountValue"
}

And we've got

      "token": "123a4",
      "token": "123a45",

And that is what we are looking for in a query_string. So why does query do not return the requested data? Is it some kind of a bug or my index settings are badly defined?

Any advice would be greatly appreciated as we are fighting with this for a some time.

I would set store: true for the amountValue field to see how it's actually generated by the copy_to function.

But anyway, to solve this, I'd use an ingest pipeline and set the field using the 2 other fields. So you have a better control on how data is copied.

Thanks for your answer.

But copy_to function works as intended. The problem exists even when i add data directly to this field (field store changed to true).

put test/_doc/4
{
  "amountValue":"123.4"
}

query_string still returns no rows.

I called:

post /test/_termvectors/1
{
  "fields" : ["amountValue"]
}

and received:

  "term_vectors": {
    "amountValue": {
      "field_statistics": {
        "sum_doc_freq": 11,
        "doc_count": 5,
        "sum_ttf": 11
      },
      "terms": {
        "123a45": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 101,
              "start_offset": 4,
              "end_offset": 10
            }
          ]
        },
        "usd": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 0,
              "start_offset": 0,
              "end_offset": 3
            }
          ]
        }
      }
    }
  }

All leads to conclusion, that query_string works differently than wildcard query, but still don't know why.

As for now i found a workaround to this problem, i simply add standard analyzer to query_string

post test/_search
{
  "query":{
    "query_string": {
      "fields": ["amountValue"], 
      "query": "123a4*",
      "analyzer": "standard"
    }
  }
}

and voila: "hits": {"total": {"value": 2...}

My question is still to answer: why wildcard query words differently to query_string for the same index, data and query?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.