Sort Chinese error

yin_weifeng · February 2, 2012, 5:42am

I use such code:
"SearchResponse searchResponse = client.prepareSearch("My_db")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(queryBuilder)
.addSort(“xxx”,SortOrder.DESC)..."
hope to sort by field "xxx" ,but when the value of field "xxx" is
Chinese , or '#','%'...，Es throw an error message:Query Failed [Failed
to execute main query],what should I do.

Ivan · February 3, 2012, 5:11am

What is the exact error? One possible issue is if field "xxx" is
analyzed. You can only sort on non-analyzed fields.

On Wed, Feb 1, 2012 at 9:42 PM, 伟峰殷 ywf1990@gmail.com wrote:

I use such code:
"SearchResponse searchResponse = client.prepareSearch("My_db")
.setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
.setQuery(queryBuilder)
.addSort(“xxx”,SortOrder.DESC)..."
hope to sort by field "xxx" ,but when the value of field "xxx" is
Chinese , or '#','%'...，Es throw an error message:Query Failed [Failed
to execute main query],what should I do.

yin_weifeng · February 3, 2012, 8:45am

Thanks Ivan!

I want to do term search and also sorting on the same field, should I make two different index fields for the same contents, or some other way?

yin_weifeng · February 3, 2012, 8:46am

deleted -

Jan_Fiedler · February 3, 2012, 10:31am

For correct, locale specific sorting you should create a separate field for
sorting purposes. This is best done via the multi-field mapping (
http://www.elasticsearch.org/guide/reference/mapping/multi-field-type.html).

The sort field should use a special (sort) analyzer that performs collation
for Chinese. In simple terms, a collator takes your term and calculates a
sorting key (that does not resemble the term). Take a look at the ICU
plugin and its collators (
http://www.elasticsearch.org/guide/reference/index-modules/analysis/icu-plugin.html
)

yin_weifeng · February 3, 2012, 12:21pm

hi, Jan

Thank you for your help!
I use this mapping,now it can sorting and search at on the field.
...
"fields" : {
"fieldName" : {"type" : "string", "index" : "analyzed"},
"sortFieldName" : {"type" : "string", "index" : "not_analyzed"}
}
...

But for the Chinese, sorting as "not_analyzed" seams no significance
How to define the special sort analyzer , for example in phonetic.

2012/2/3 Jan Fiedler fiedler.jan@gmail.com

For correct, locale specific sorting you should create a separate field
for sorting purposes. This is best done via the multi-field mapping (
Elasticsearch Platform — Find real-time answers at scale | Elastic
).

The sort field should use a special (sort) analyzer that performs
collation for Chinese. In simple terms, a collator takes your term and
calculates a sorting key (that does not resemble the term). Take a look at
the ICU plugin and its collators (
Elasticsearch Platform — Find real-time answers at scale | Elastic
)

yin_weifeng · February 3, 2012, 12:23pm

hi, Jan

Thank you for your help!
I use this mapping,now it can sorting and search at on the field.
...
"fields" : {
"fieldName" : {"type" : "string", "index" : "analyzed"},
"sortFieldName" : {"type" : "string", "index" : "not_analyzed"}
}
...

But for the Chinese, sorting as "not_analyzed" seams no significance
How to define the special sort analyzer , for example in phonetic.

2012/2/3 Jan Fiedler fiedler.jan@gmail.com

For correct, locale specific sorting you should create a separate field
for sorting purposes. This is best done via the multi-field mapping (
Elasticsearch Platform — Find real-time answers at scale | Elastic
).

The sort field should use a special (sort) analyzer that performs
collation for Chinese. In simple terms, a collator takes your term and
calculates a sorting key (that does not resemble the term). Take a look at
the ICU plugin and its collators (
Elasticsearch Platform — Find real-time answers at scale | Elastic
)

Jan_Fiedler · February 3, 2012, 2:47pm

You are close but not there yet. To get Chinese sorting right you need the
following 3 additional steps:

1. Configure a sort analyzer for your sort field

...
"fields" : {
"fieldName" : {"type" : "string", "index" : "analyzed"},
"sortFieldName" : {"type" : "string", "index" : "analyzed", "analyzer"
: "my_chinese_sort"}
}
...

Configure the sort analyzer (e.g. in elasticsearch.yml)

index:
analysis:
analyzer:
my_chinese_sort :
type : custom
tokenizer : keyword
filter : [icu_collation_chinese]

filter:
icu_collation_chinese:
type: icu_collation
language : ch

I am not sure about the actual language identifier to be used for Chinese.
I trust that ICU supports Chinese (I did not try it).

3. Install the ICU plugin

Run the following from your ES home:

bin/plugin -install elasticsearch/elasticsearch-analysis-icu/1.1.0

yin_weifeng · February 6, 2012, 2:06am

Thank you!

You are right, the configuration can really make the field order by Chinese
pinyin, I think that ICU do something with it.

But the configuration in elasticsearch.yml has no effect,I use

curl -XPOST localhost:9200/backlog_db -d '{
"settings":{
"index" : {
"analysis" : {
"analyzer" : {
"my_chinese_sort" : {
"type" : "custom",
"tokenizer" : "keyword",
"filter" : ["icu_collation_chinese"]
}
},
"filter" : {
"icu_collation_chinese:" : {
"type" : "icu_collation",
"language" : "ch"
}
}
}
}
}
}'

to configure the sort analyzer.

Jan_Fiedler · February 6, 2012, 11:17am

I recommend using the analyze API (via curl interactively) to test whether
your analyzer settings made it correctly into your index. Find information
on the analyzer API usage here:
http://www.elasticsearch.org/guide/reference/api/admin-indices-analyze.html

yin_weifeng · February 15, 2012, 8:55am

Now it works well on win7 system, but when I use it in Linux environment
(Ubuntu 11.10 and CentOS5.6), the sorting result are different even the
configuration is same.

What could be the reason?

thanks!

2012/2/6 Jan Fiedler fiedler.jan@gmail.com

I recommend using the analyze API (via curl interactively) to test whether
your analyzer settings made it correctly into your index. Find information
on the analyzer API usage here:
Elasticsearch Platform — Find real-time answers at scale | Elastic

Topic		Replies	Views
Sorting failing in latest master Elasticsearch	6	303	July 6, 2017
Sort failed in "NOT_ANALYZED" field Elasticsearch	5	414	July 6, 2017
Problem with sorting using an analyzed field Elasticsearch	2	302	July 6, 2017
Specify sorting collation, or sort collation on a script field Elasticsearch	1	634	July 6, 2017
Can't sort on string of many words Elasticsearch	6	357	July 6, 2017

Sort Chinese error

Related topics