Unexpected Behavior with ICU Collation Keyword Sorting

Hello,

I am experiencing unexpected behavior with the sorting order of documents in Elasticsearch using the icu_collation_keyword field type. Here are the details:

Steps to Reproduce:

  1. Create the Index with Mappings:
    PUT /test-index
    {
    "mappings": {
    "properties": {
    "id422": {
    "type": "text",
    "fields": {
    "collated": {
    "type": "icu_collation_keyword",
    "strength": "tertiary",
    "case_level": true
    }
    }
    }
    }
    }
    }

  2. Index the Documents:
    POST /test-index/_doc/1
    {
    "id422": "0a11"
    }

POST /test-index/_doc/2
{
"id422": "0A11"
}

POST /test-index/_doc/3
{
"id422": "0b11"
}

POST /test-index/_doc/4
{
"id422": "0B11"
}

POST /test-index/_doc/5
{
"id422": "0c11"
}

POST /test-index/_doc/6
{
"id422": "0C11"
}

  1. Search and Sort:

GET /test-index/_search
{
"sort": [
{
"id422.collated": {
"order": "asc"
}
}
],
"_source": ["id422"]
}

Expected Sort Order:

  1. 0A11
  2. 0B11
  3. 0C11
  4. 0a11
  5. 0b11
  6. 0c11

Actual Sort Order:

The response includes unexpected characters in the sort field, and the order does not match the expected case-sensitive sorting.

Response:

Sort order
0a11
0A11
0b11
0B11
0c11
0C11

The sort fields of the response contain unexpected cryptic characters like:
"sort": [
"""কՅ‡ࡀ

Additional Information:

  • Elasticsearch version: 8.15.3
  • Kibana version: 8.15.3
  • ICU Analysis plugin version: 8.15.3

Any insights or suggestions on how to resolve this issue would be greatly appreciated.

Thank you!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.