Sort Chinese error


(yin weifeng) #1

I use such code:
"SearchResponse searchResponse = client.prepareSearch("My_db")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(queryBuilder)
.addSort(“xxx”,SortOrder.DESC)..."
hope to sort by field "xxx" ,but when the value of field "xxx" is
Chinese , or '#','%'...,Es throw an error message:Query Failed [Failed
to execute main query],what should I do.


(Ivan Brusic) #2

What is the exact error? One possible issue is if field "xxx" is
analyzed. You can only sort on non-analyzed fields.

On Wed, Feb 1, 2012 at 9:42 PM, 伟峰 殷 ywf1990@gmail.com wrote:

I use such code:
"SearchResponse searchResponse = client.prepareSearch("My_db")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(queryBuilder)
.addSort(“xxx”,SortOrder.DESC)..."
hope to sort by field "xxx" ,but when the value of field "xxx" is
Chinese , or '#','%'...,Es throw an error message:Query Failed [Failed
to execute main query],what should I do.


(yin weifeng) #3

Thanks Ivan!

I want to do term search and also sorting on the same field, should I make two different index fields for the same contents, or some other way?


(yin weifeng) #4
  • deleted -

(Jan Fiedler) #5

For correct, locale specific sorting you should create a separate field for
sorting purposes. This is best done via the multi-field mapping (
http://www.elasticsearch.org/guide/reference/mapping/multi-field-type.html).

The sort field should use a special (sort) analyzer that performs collation
for Chinese. In simple terms, a collator takes your term and calculates a
sorting key (that does not resemble the term). Take a look at the ICU
plugin and its collators (
http://www.elasticsearch.org/guide/reference/index-modules/analysis/icu-plugin.html
)


(yin weifeng) #6

hi, Jan

Thank you for your help!
I use this mapping,now it can sorting and search at on the field.
...
"fields" : {
"fieldName" : {"type" : "string", "index" : "analyzed"},
"sortFieldName" : {"type" : "string", "index" : "not_analyzed"}
}
...

But for the Chinese, sorting as "not_analyzed" seams no significance
How to define the special sort analyzer , for example in phonetic.

2012/2/3 Jan Fiedler fiedler.jan@gmail.com

For correct, locale specific sorting you should create a separate field
for sorting purposes. This is best done via the multi-field mapping (
http://www.elasticsearch.org/guide/reference/mapping/multi-field-type.html
).

The sort field should use a special (sort) analyzer that performs
collation for Chinese. In simple terms, a collator takes your term and
calculates a sorting key (that does not resemble the term). Take a look at
the ICU plugin and its collators (
http://www.elasticsearch.org/guide/reference/index-modules/analysis/icu-plugin.html
)


(yin weifeng) #7

hi, Jan

Thank you for your help!
I use this mapping,now it can sorting and search at on the field.
...
"fields" : {
"fieldName" : {"type" : "string", "index" : "analyzed"},
"sortFieldName" : {"type" : "string", "index" : "not_analyzed"}
}
...

But for the Chinese, sorting as "not_analyzed" seams no significance
How to define the special sort analyzer , for example in phonetic.

2012/2/3 Jan Fiedler fiedler.jan@gmail.com

For correct, locale specific sorting you should create a separate field
for sorting purposes. This is best done via the multi-field mapping (
http://www.elasticsearch.org/guide/reference/mapping/multi-field-type.html
).

The sort field should use a special (sort) analyzer that performs
collation for Chinese. In simple terms, a collator takes your term and
calculates a sorting key (that does not resemble the term). Take a look at
the ICU plugin and its collators (
http://www.elasticsearch.org/guide/reference/index-modules/analysis/icu-plugin.html
)


(Jan Fiedler) #8

You are close but not there yet. To get Chinese sorting right you need the
following 3 additional steps:

1. Configure a sort analyzer for your sort field

...
"fields" : {
"fieldName" : {"type" : "string", "index" : "analyzed"},
"sortFieldName" : {"type" : "string", "index" : "analyzed", "analyzer"
: "my_chinese_sort"}
}
...

  1. Configure the sort analyzer (e.g. in elasticsearch.yml)

index:
analysis:
analyzer:
my_chinese_sort :
type : custom
tokenizer : keyword
filter : [icu_collation_chinese]

filter:
icu_collation_chinese:
type: icu_collation
language : ch

I am not sure about the actual language identifier to be used for Chinese.
I trust that ICU supports Chinese (I did not try it).

3. Install the ICU plugin

Run the following from your ES home:

bin/plugin -install elasticsearch/elasticsearch-analysis-icu/1.1.0


(yin weifeng) #9

Thank you!

You are right, the configuration can really make the field order by Chinese
pinyin, I think that ICU do something with it.

But the configuration in elasticsearch.yml has no effect,I use

curl -XPOST localhost:9200/backlog_db -d '{
"settings":{
"index" : {
"analysis" : {
"analyzer" : {
"my_chinese_sort" : {
"type" : "custom",
"tokenizer" : "keyword",
"filter" : ["icu_collation_chinese"]
}
},
"filter" : {
"icu_collation_chinese:" : {
"type" : "icu_collation",
"language" : "ch"
}
}
}
}
}
}'

to configure the sort analyzer.


(Jan Fiedler) #10

I recommend using the analyze API (via curl interactively) to test whether
your analyzer settings made it correctly into your index. Find information
on the analyzer API usage here:
http://www.elasticsearch.org/guide/reference/api/admin-indices-analyze.html


(yin weifeng) #11

Now it works well on win7 system, but when I use it in Linux environment
(Ubuntu 11.10 and CentOS5.6), the sorting result are different even the
configuration is same.

What could be the reason?

thanks!

2012/2/6 Jan Fiedler fiedler.jan@gmail.com

I recommend using the analyze API (via curl interactively) to test whether
your analyzer settings made it correctly into your index. Find information
on the analyzer API usage here:
http://www.elasticsearch.org/guide/reference/api/admin-indices-analyze.html


(system) #12