Wildcards and mapping in query_string query

Hello,

I have a case where I use wildcards in the query_string query. I am seeing a behaviour that I can't explain.
I create simple index like the following:

curl -x "" -k -XPUT 'localhost:9200/wildcardtest' -d '{
  "mappings": {
    "tweet": {
      "properties": {
        "message_not_analyzed": {
          "type": "string",
          "index": "not_analyzed"
        },
        "message_analyzed": {
          "type": "string",
          "index": "analyzed"
        }
      }
    }
  }
}'

Then I put one document:
curl -x "" -XPUT localhost:9200/wildcardtest/tweet/1 -d '{
"message_not_analyzed": "M1000",
"message_analyzed": "M1000"
}'

Then I search. The match_all query matches:
$ curl -x "" localhost:9200/wildcardtest/_search -d '{
> "query" : {
> "match_all" : {}
> }
> }'
{"took":116,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"wildcardtest","_type":"tweet","_id":"1","_score":1.0,"_source":{
"message_not_analyzed": "M1000",
"message_analyzed": "M1000"
}}]}}

Then I use the query_string with a wildcard on the analyzed field and it matches:
curl -x "" localhost:9200/wildcardtest/_search -d '{
> "query" : {
> "query_string" : { "default_field" : "message_analyzed", "query" : "M1*" }
> }
> }'
{"took":10,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"wildcardtest","_type":"tweet","_id":"1","_score":1.0,"_source":{
"message_not_analyzed": "M1000",
"message_analyzed": "M1000"
}}]}}

However when I query in the "message_analyzed" field with wildcard, it does not match:
$ curl -x "" localhost:9200/wildcardtest/_search -d '{
> "query" : {
> "query_string" : { "default_field" : "message_not_analyzed", "query" : "M1*" }
> }
> }'
{"took":14,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}

If I change the query to "M*", it matches:
$ curl -x "" localhost:9200/wildcardtest/_search -d '{
"query" : {
"query_string" : { "default_field" : "message_analyzed", "query" : "M*" }
}
}'
{"took":18,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":1,"max_score":1.0,"hits":[{"_index":"wildcardtest","_type":"tweet","_id":"1","_score":1.0,"_source":{
"message_not_analyzed": "M1000",
"message_analyzed": "M1000"
}}]}}

Can anyone explain the above behaviour?

Best regards,

Klearchos

My guess is that the default analyzer has indexed M1000 to m and 1000 probably.

Use the _analyze API to understand what is happening behind the scene.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.