ES 5.0 wildcard query, problem with CJK characters

Hacksign · December 13, 2016, 5:05am

If there is a data in ES like this :

{'realname':'XZY'}

note : X/Z/Y are CJK charcters, NOT English letters.

If I want pick above item out , I wrote dsl below :

    {
        "size" : 10,
        "query" : {
            "wildcard" : {
                "realname" : "X*"
            }
        }
    }

this works fine, but If DSL is like this :

{
    "size" : 10,
    "query" : {
        "wildcard" : {
            "realname" : "X*Y"
        }
    }
}

this can not find anything.

anything wrong ? or I misunderstank something from this document ?

dadoonet · December 13, 2016, 6:02am

Try the _analyze API to see how your document is actually indexed.
Then remember that wildcard string is not analyzed so it's compared to the previous output.

Finally: don't use wildcards!

Hacksign · December 13, 2016, 8:32am

thanks for reply.
this is the output of _analyze :

XZY are still CJK characters ...

[root@host ~]# curl http://localhost:9200/dbs/_analyze?pretty -d '{"field":"some_field", "text":"XZY"}'
{
  "tokens" : [
    {
      "token" : "X",
      "start_offset" : 0,
      "end_offset" : 1,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "Z",
      "start_offset" : 1,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "Y",
      "start_offset" : 2,
      "end_offset" : 3,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    }
  ]
}

the problem confuse me is ,
under Elasticsearch 2.3, wildcard search like this:

    "size" : 10,
    "query" : {
        "wildcard" : {
            "realname" : "X*Y"
        }
    }
}

will return results.

but after upgrade es to 5.0, only querys below could return results :

    {
        "size" : 10,
        "query" : {
            "wildcard" : {
                "realname" : "X*"
            }
        }
    }

if this is a problem relative to mapping and participle, why "X*" could hit results while "X*Y" could not ?

dadoonet · December 13, 2016, 8:54am

I don't know how it worked previously in 2.x series.
May be the analyzer you were using was producing [ "XYZ" ] instead of [ "X", "Y", "Z" ]?

Hacksign · December 16, 2016, 2:44am

As _analyze api returned.
CJK character is analyzed as ['X', 'Z', 'Y'], not ['XZY'].
this seems to be the default analyzer behaviour(split CJK characters into single word one after another).

So, still confused of understanding why can not get correct result by providing 'X*Y' to wildcard query.

dadoonet · December 17, 2016, 8:21am

So if you have in the inverted index:

X
Y
Z

X*Y won't match any on those, right?
X*, Y*, Z* will.

system · January 14, 2017, 8:21am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ES cannot search for special characters when using wildcard search Elasticsearch	4	327	December 6, 2022
Can we use Chinese character in wildcard query Elasticsearch	3	1768	July 6, 2017
"query_string" dosen't analyze wildcard queries Elasticsearch	5	4844	December 28, 2017
Wildcard with ascii Elasticsearch	2	320	July 6, 2017
Query string with wild card not returning the expected results , all the times Elasticsearch	5	699	December 17, 2019

ES 5.0 wildcard query, problem with CJK characters

Related topics