Sorting a string field numerically in elasticsearch

Hi guys. When sorting on a field that's a string, strings that contain
numbers aren't sorted properly.

For example, with these documents with type: string:
{ login_id: "1" }
{ login_id: "1A" }
{ login_id: "1B" }
{ login_id: "12" }
{ login_id: "123" }
{ login_id: "20" }
{ login_id: "22" }
{ login_id: "user" }
{ login_id: "user1" }

When ES sorts on the "login_id" field, the documents are returned in this order:
{ login_id: "1" }
{ login_id: "12" }
{ login_id: "123" }
{ login_id: "20" }
{ login_id: "22" }
{ login_id: "1A" }
{ login_id: "1B" }
{ login_id: "user" }
{ login_id: "user1" }

How can we get ES to return the documents in the following order?
{ login_id: "1" }
{ login_id: "12" }
{ login_id: "20" }
{ login_id: "22" }
{ login_id: "123" }
{ login_id: "1A" }
{ login_id: "1B" }
{ login_id: "user" }
{ login_id: "user1" }
I used:
sort : {
_script: {
type: "string",
script: {
source: "def s = doc['login_id'].value;
def n = org.elasticsearch.common.primitives.Integer.parseInt(s);
if (n != null) { String.format("%010d",n)} else { s }"
},
order: "#{sort[:sort]}"
}
}
But It's error: "reason":"Variable [org] is not defined."}}}]}

What you are trying to do is IMO best covered by the ICU analysis plugin and specifically its keyword field.

I gave it a quick shot and tried the following:

DELETE test

PUT test
{
  "mappings": {
    "_doc": {
      "properties": {
        "login_id": {   
          "type": "text",
          "fields": {
            "sort": {  
              "type": "icu_collation_keyword",
              "index": false,
              "numeric": true,
              "case_level": false
            }
          }
        }
      }
    }
  }
}

POST test/_doc
{ "login_id": "1" }

POST test/_doc
{ "login_id": "1A" }

POST test/_doc
{ "login_id": "1B" }

POST test/_doc
{ "login_id": "12" }

POST test/_doc
{ "login_id": "123" }

POST test/_doc
{ "login_id": "20" }

POST test/_doc
{ "login_id": "22" }

POST test/_doc
{ "login_id": "user" }

POST test/_doc
{ "login_id": "user1" }

POST test/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "login_id.sort": {
        "order": "asc"
      }
    }
  ]
}

This gives you back 1, 1A, 1B, 12, 20, 22, 123, user, user1. I'm not sure if it's (easily) possible to get 1A after 123; that feels rather counterintuitive.

It's not working

This is not a helpful comment: What specifically is not working? Sorting is wrong, you have issues installing the ICU plugin (which I didn't explicitly mention, sorry),...

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.