Kuromoji_tokenizer: sort clause does not seem to work for some specific character combinations

Query:

{
  "query": {
    "bool": {
    }
  },
  "sort": [
    {
      "attribute.sortable": {
        "order": "asc"
      }
    }
  ]
}

Results:

"hits": [
  {
    "_index": "example_1",
    "_type": "example_1",
    "_id": "A2Ff26qFaV",
    "_score": null,
    "_source": {
      "attributes": {
        "attribute": "サヨ",
      }
    },
    "sort": [
      "サヨ"
    ]
  },
  {
    "_index": "example_2",
    "_type": "example_2",
    "_id": "A2Ff26qFaV",
    "_score": null,
    "_source": {
      "attributes": {
        "attribute": "シヨ",
      }
    },
    "sort": [
      "シ"
    ]
  }
]

The sort is working on the characters in attribute field for example_1 doc but not for example_2 doc.

Observed this in 3 instances in total for these strings:

  • シヨ
  • ヲシ
  • シヲ

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.