Can we use Chinese character in wildcard query


(mohit.Kumar) #1

Hi folks,
I am trying to get data from a Chinese keyword, but it always showing Zero
hits.

Elasticsearch query :

*{"query":{"wildcard" : { "text" : " 好不 " }}}

I am using java program to fire this query. I have tried UTF-8 conversion
to get data but failed to get any data.

thanks in advance.

Regrads
Mohit

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Brian Yoder) #2

The prefix query (snippet below) works for me. For example:

{"prefix" : {"words" : "醫"}}

I haven't tried a wildcard query in Java, since is is rather like a very
slow grep and not generally useful. Ending wildcards are the same as prefix
queries (logically) but are typically rather fast in my experience.

I hope this helps!

Brian

On Monday, October 21, 2013 9:04:48 AM UTC-4, Mohit Kumar Yadav wrote:

Hi folks,
I am trying to get data from a Chinese keyword, but it always showing
Zero hits.

Elasticsearch query :

*{"query":{"wildcard" : { "text" : " 好不 " }}}

I am using java program to fire this query. I have tried UTF-8 conversion
to get data but failed to get any data.

thanks in advance.

Regrads
Mohit

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Brian Yoder) #3

By the way, when I first tried to create a working example using my local
test/dev index, my Chinese characters were missing and queries against them
did not work. I don't exactly recall the last time I deleted and reloaded
that index, nor do I remember exactly which ES versions were changed. But I
am currently running on ES 0.90.3, and I believe the index was deleted and
recreated (with successful regression tests including Chinese characters)
no earlier than 0.90.0. So I don't have any logs to show; just results. But
here are the results:

In general, this is against a synonym "table". (Yeah, I know. But I do find
that a separate query for synonyms means that changing synonyms does not
require a reload or reindex of the data. And performance is very good.)

{
"bool" : {
"must" : [ {
"match" : {
"field" : {
"query" : "gn",
"type" : "boolean"
}
}
}, {
"prefix" : {
"words" : "醫"
}
} ]
}
}

  1. When I first used my current laptop set-up to get a working example,
    nothing was found. When I queried one of the English terms, the following
    result came back. Note that the last value is expected to be a Chinese
    phrase but comes out null instead:

{ "field" : [ "gn" , "o" , "cnam" ] , "words" : [ "Dr" , "Doctor" , "MD" ,
"Phd" , null ] }

  1. After deleting and reloading the index, the query now returns all words
    including the Chinese:

{ "field" : [ "gn" , "o" , "cnam" ] , "words" : [ "Dr" , "Doctor" , "MD" ,
"Phd" , "醫生" ] }

Not sure why, since this has always worked starting with my initial ES
version 19.4 and hasn't yet (until today) failed.

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #4