Hi there,
I have an integ test derived from ESIntegTestCase
where I index a document with Chinese characters in some fields, e.g.:
POST /account/account
{
"account_number" : 6666,
"balance" : 1515,
"firstname" : "盛虹",
"lastname" : "Last",
"age" : 32,
"gender" : "M",
"address" : "4587 Some Corridor",
"employer" : "Some company",
"email" : "someone@gmail.com",
"city" : "Beijing",
"state" : "CN"
}
And then I just search for this document by account_number:
GET account/_search
{
"query": {
"term": {
"account_number": {
"value": "6666",
"boost": 1.0
}
}
}
}
The result is the following (note the firstname
field value):
{
"took" : 7,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [
{
"_index" : "account",
"_type" : "account",
"_id" : "iyb5vWoBQx5pa0FSu3t7",
"_score" : 1.0,
"_source" : {
"account_number" : 6666,
"balance" : 1515,
"firstname" : "τ\u203a\u203aΦÖ╣",
"lastname" : "Last",
"age" : 32,
"gender" : "M",
"address" : "4587 Some Corridor",
"employer" : "Some company",
"email" : "someone@gmail.com",
"city" : "Beijing",
"state" : "CN"
}
}
]
}
}
Now the interesting part is that this result I get on my windows machine. On linux the test passes, meaning whatever I stored the same value I get back. What is even more interesting, when I run OSS elasticsearch manually on my windows machine, and try to manually do the steps in the test, I get back the correct value in the search result.
I am stuck, and not sure how to proceed. I tried removing the file.encoding option from jvm.options in my local cluster (on windows), but that doesn't change anything. Initially I was suspecting this has something to do with windows filesystem encoding, but then how come manual test works fine, perhaps my local cluster comes with different settings which override something that the integ test cluster does not.
Here is the mapping for the account
index used in the test:
{
"settings" : {
"number_of_shards" : 1
},
"mappings" : {
"account" : {
"properties" : {
"gender" : {
"type" : "text",
"fielddata" : true
},
"address" : {
"type" : "text",
"fielddata" : true
},
"state" : {
"type" : "text",
"fielddata" : true,
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
}
}
}