Wildcard can not search non-English text

Yesterday, I used wildcard to search some keywords, but it alway failed to hit the non-English text. Informations are as below.

  • Query Clause
    {
    "query": {
    "bool": {
    "should": [
    {
    "wildcard": {
    "temp_path": "*kimch*"
    }
    },
    {
    "wildcard": {
    "temp_path": "*迷惑メール*"
    }
    }
    ]
    }
    }
    }

  • The field temp_path is "text" type and "not_analyzed".

According to the document, "In order to prevent extremely slow wildcard queries, a wildcard term should not start with one of the wildcards * or ?. ", I used "迷惑メール*", instead of "*迷惑メール*", then it succeeded to hit the Japanese text.

So my question is: for the wildcard term like "*xxx*", why it can work on English text, but can not hit non-English text?

Thanks!

  • Bruce, 2017/12/09

1st: avoid wildcards!

2nd: it’s probably happening because your text has been analyzed with the default analyzer at index time but when using wildcards the text is not analyzed. So analyzed terms and non analyzed terms don’t match.

Try to see what the _analyze API is telling you about the way your text is analyzed and you’ll get a better idea.

Hi David

Thank you for reply.
I totally agree with you that we should avoid wildcards. I'm sure the field is not analyzed.
{

"settings": {

"number_of_shards": 1

},

"mappings": {

"test": {

"properties": {

"temp_path": {

"type": "text",

"index": "not_analyzed"

}

}

}

}

}

I'm just curious why "*迷惑メール*" can not hit the text, but "迷惑メール*" can. And why "*kimch*" can hit all English text.

Bruce

So your text is not analyzed. Interesting.

@johtani do you have an idea?

Hi David

It's my fault to misuse "type:text" and "type:string". You are right, the text is analyzed. Sorry for your time.
And anyone else who viewing this topic, please be careful about this.

~~
The string field datatype has been replaced by the text field for full text analyzed content, and the keyword field for not-analyzed exact string values.
~~

Bruce

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.