Sorting on a scripted terms aggregation is lexically instead of numerically

Hey,

maybe this is intended behaviour but I would like to verify. If I am using a terms aggregation with a script and specify sorting by key, it does not take into account when the script is returning an integer value. Example:

PUT test 

PUT test/_bulk?refresh
{ "index" : {} }
{ "date" : "2025-09-26T01:23:45.00Z" }
{ "index" : {} }
{ "date" : "2025-09-26T02:23:45.00Z" }
{ "index" : {} }
{ "date" : "2025-09-26T10:23:45.00Z" }
{ "index" : {} }
{ "date" : "2025-09-26T10:23:45.00Z" }

GET test/_search
{
  "size": 0,
  "aggs": {
    "by_hour": {
      "terms": {
        "order": {
          "_key": "asc"
        },
        "script": {
          "source": "doc['date'].value.getHour()"
        }
      }
    }
  }
}

This returns

"aggregations": {
    "by_hour": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 0,
      "buckets": [
        {
          "key": "1",
          "doc_count": 1
        },
        {
          "key": "10",
          "doc_count": 2
        },
        {
          "key": "2",
          "doc_count": 1
        }
      ]
    }
  }

The docs state:

In this case, the buckets are ordered by the actual term values, such as lexicographic order for keywords or numerically for numbers.

Running the script with Debug.explain() returns an Integer as value.

Sort order is also correct when I index a field as an integer value and use that with the order in a terms aggregation.

This is on Elasticsearch 9.1.3

Is this a scripting issue or am I holding it wrong?

–Alex

Can you use a runtime field instead where you have more control of the data type?

I can see the keys of the aggregation are strings so they seem to be treated as strings.

That works. Should I file a bug for the script or is this considered deprecated?