Hello,
I am experiencing unexpected behavior with the sorting order of documents in Elasticsearch using the icu_collation_keyword
field type. Here are the details:
Steps to Reproduce:
-
Create the Index with Mappings:
PUT /test-index
{
"mappings": {
"properties": {
"id422": {
"type": "text",
"fields": {
"collated": {
"type": "icu_collation_keyword",
"strength": "tertiary",
"case_level": true
}
}
}
}
}
} -
Index the Documents:
POST /test-index/_doc/1
{
"id422": "0a11"
}
POST /test-index/_doc/2
{
"id422": "0A11"
}
POST /test-index/_doc/3
{
"id422": "0b11"
}
POST /test-index/_doc/4
{
"id422": "0B11"
}
POST /test-index/_doc/5
{
"id422": "0c11"
}
POST /test-index/_doc/6
{
"id422": "0C11"
}
- Search and Sort:
GET /test-index/_search
{
"sort": [
{
"id422.collated": {
"order": "asc"
}
}
],
"_source": ["id422"]
}
Expected Sort Order:
0A11
0B11
0C11
0a11
0b11
0c11
Actual Sort Order:
The response includes unexpected characters in the sort
field, and the order does not match the expected case-sensitive sorting.
Response:
Sort order
0a11
0A11
0b11
0B11
0c11
0C11
The sort fields of the response contain unexpected cryptic characters like:
"sort": [
"""কՅ‡ࡀ
Additional Information:
- Elasticsearch version: 8.15.3
- Kibana version: 8.15.3
- ICU Analysis plugin version: 8.15.3
Any insights or suggestions on how to resolve this issue would be greatly appreciated.
Thank you!