Wildcard can not search non-English text

bruce88 · December 9, 2017, 3:56am

Yesterday, I used wildcard to search some keywords, but it alway failed to hit the non-English text. Informations are as below.

Query Clause
{
"query": {
"bool": {
"should": [
{
"wildcard": {
"temp_path": "*kimch*"
}
},
{
"wildcard": {
"temp_path": "*迷惑メール*"
}
}
]
}
}
}
The field temp_path is "text" type and "not_analyzed".

According to the document, "In order to prevent extremely slow wildcard queries, a wildcard term should not start with one of the wildcards * or ?. ", I used "迷惑メール*", instead of "*迷惑メール*", then it succeeded to hit the Japanese text.

So my question is: for the wildcard term like "*xxx*", why it can work on English text, but can not hit non-English text?

Thanks!

Bruce, 2017/12/09

dadoonet · December 9, 2017, 4:37am

1st: avoid wildcards!

2nd: it’s probably happening because your text has been analyzed with the default analyzer at index time but when using wildcards the text is not analyzed. So analyzed terms and non analyzed terms don’t match.

Try to see what the _analyze API is telling you about the way your text is analyzed and you’ll get a better idea.

bruce88 · December 9, 2017, 7:14am

Hi David

Thank you for reply.
I totally agree with you that we should avoid wildcards. I'm sure the field is not analyzed.
{

"settings": {

"number_of_shards": 1

},

"mappings": {

"test": {

"properties": {

"temp_path": {

"type": "text",

"index": "not_analyzed"

}

I'm just curious why "*迷惑メール*" can not hit the text, but "迷惑メール*" can. And why "*kimch*" can hit all English text.

Bruce

dadoonet · December 9, 2017, 8:52am

So your text is not analyzed. Interesting.

@johtani do you have an idea?

bruce88 · December 9, 2017, 11:46am

Hi David

It's my fault to misuse "type:text" and "type:string". You are right, the text is analyzed. Sorry for your time.
And anyone else who viewing this topic, please be careful about this.

~~
The string field datatype has been replaced by the text field for full text analyzed content, and the keyword field for not-analyzed exact string values.
~~

Bruce

system · January 6, 2018, 11:46am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Wildcard search result problems Elasticsearch	6	353	May 22, 2018
Is the text provided to a wildcard query analyzed? Elasticsearch	2	953	July 5, 2017
Wildcard with ascii Elasticsearch	2	320	July 6, 2017
Analyze_wildcard Elasticsearch	3	5964	July 6, 2017
Can we use Chinese character in wildcard query Elasticsearch	3	1768	July 6, 2017

Wildcard can not search non-English text

Related topics