Java transport client doing chinese match query hit nothing


#1

Elasticsearch version (bin/elasticsearch --version):
Version: 2.4.4, Build: fcbb46d/2017-01-03T11:33:16Z, JVM: 1.8.0_112
Plugins installed: []
ik, head
JVM version (java -version):
JVM: 1.8.0_112
OS version (uname -a if on a Unix-like system):
Darwin DoudeMacBook-Pro.local 16.6.0 Darwin Kernel Version 16.6.0: Fri Apr 14 16:21:16 PDT 2017; root:xnu-3789.60.24~6/RELEASE_X86_64 x86_64

Description of the problem including expected versus actual behavior:
use java trasport client, construct a match query using operator "and", the value is Chinese, in the index, there are records for the query, but query nothing.
Steps to reproduce:
index:
{
"user" : {
"aliases" : { },
"mappings" : {
"userInfo" : {
"properties" : {
"age" : {
"type" : "integer"
},
"created" : {
"type" : "date",
"format" : "strict_date_optional_time||epoch_millis"
},
"name" : {
"type" : "string",
"store" : true
},
"userId" : {
"type" : "string",
"index" : "not_analyzed",
"store" : true
}
}
}
},
"settings" : {
"index" : {
"client" : {
"transport" : {
"ping_timeout" : "60s"
}
},
"creation_date" : "1502344889730",
"number_of_shards" : "2",
"number_of_replicas" : "1",
"uuid" : "v88dMBG0QbmiTrLZ0rIqpQ",
"version" : {
"created" : "2040499"
}
}
},
"warmers" : { }
}
}
insert three records:
User user1 = new User("105", "Test ABC 芦玉Match123", 31, new Date());
User user2 = new User("106", "test 开发 芦match玉", 41, new Date());
User user3 = new User("107", "This 开发 is a 芦玉test", 51, new Date());

Client:
{
"size" : 100,
"query" : {
"bool" : {
"should" : {
"match" : {
"name" : {
"query" : "\u82A6\u7389",
"type" : "phrase_prefix",
"operator" : "AND",
"minimum_should_match" : "90%"
}
}
}
}
},
"explain" : true
}
no hit.
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": [ ]
}
}

Now use:
{
"size": 100,
"query": {
"bool": {
"should": {
"match": {
"name": {
"query": "芦玉",
"type": "phrase_prefix",
"operator": "AND",
"minimum_should_match": "90%"
}
}
}
}
},
"explain": true
}

{
"took": 3,
"timed_out": false,
"_shards": {
"total": 2,
"successful": 2,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0.52021796,
"hits": [...


(David Pilato) #2

Could you provide a full recreation script as described in

It will help to better understand what you are doing and we will be able to play with your script just by copy and pasting.

I think it's related to analyzers here.

BTW have a look at the _analyze API to see how your terms are actually indexed.


#3
  1. Create a simple index and mapping, the mapping use default analyzer.
  2. Index some records with Chinese words.
  3. Using java client build a match query, operator is 'and', value are Chinese words.
  4. Then java client execute the query, the result hit nothing(There are really have the words in the index).
    5.The java client tostring method will print the query , in which the Chinese words are converted to \uxxxx
  5. If I paste the printed json string in the head plugin, then search, aslo hit nothing.
    7 but if i change the converted Chinese words from \uxxxx to real Chinese, the head query will hit.

(David Pilato) #4

I understood what you do. I just need a script I can play to reproduce it.


#5

How to upload scripts?


#6

It's my fault. Thx you very much. I use the apache common lang StringEscapeUtils to escape special char, which will convert Chinese to \uxxxxx.

How to escape the {} ! chars?


(David Pilato) #7

Please read again:


(system) #8

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.