Is case-insensitive sort on keyword aggregation. ES 7.7.0

I have similar case like here and here2.

I am trying to achive alphabetical sorting (case insensitive) with original value from aggregation.

Simplified example mapping:

PUT test_data
{
    "mappings": {
        "properties": {
            "id": {
                "type": "keyword"
            },
            "name@String": {
                "properties": {
                    "values": {
                        "properties": {
                            "defaultValue": {
                                "type": "text",
                                "fields": {
                                    "keyword": {
                                        "type": "keyword",
                                        "ignore_above": 256
                                    },
                                    "sortword": {
                                        "type": "keyword",
                                        "ignore_above": 256,
                                        "normalizer": "case_insensitive"
                                    }
                                }
                            },
                            "translations": {
                                "properties": {
                                    "0": {
                                        "type": "text",
                                        "fields": {
                                            "keyword": {
                                                "type": "keyword",
                                                "ignore_above": 256
                                            },
                                            "sortword": {
                                                "type": "keyword",
                                                "ignore_above": 256,
                                                "normalizer": "case_insensitive"
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    },
    "settings": {
        "analysis": {
            "normalizer": {
                "case_insensitive": {
                    "filter": [
                        "lowercase"
                    ],
                    "type": "custom"
                }
            }
        }
    }
}

My test data:

POST test_data/_doc/
{
    "name@String": {
        "values": {
            "defaultValue": "CCC",
            "translations": {
                "0": "CCC"
            }
        }
    }
}

POST test_data/_doc/
{
    "name@String": {
        "values": {
            "defaultValue": "bbb",
            "translations": {
                "0": "bbb"
            }
        }
    }
}

POST test_data/_doc/
{
    "name@String": {
        "values": {
            "defaultValue": "BBB",
            "translations": {
                "0": "BBB"
            }
        }
    }
}

POST test_data/_doc/
{
    "name@String": {
        "values": {
            "defaultValue": "aaa",
            "translations": {
                "0": "aaa"
            }
        }
    }
}

POST test_data/_doc/
{
    "name@String": {
        "values": {
            "defaultValue": "AAA",
            "translations": {
                "0": "AAA"
            }
        }
    }
}

Terms aggregation Order paragraph says: Ordering the buckets alphabetically by their terms in an ascending manner.
Here is my aggregation_1:

POST test_data/_search
{
    "query": {
        "match_all": {}
    },
    "aggs": {
        "names": {
            "terms": {
                "field": "name@String.values.translations.0.keyword",
                "order": {
                    "_key": "asc"
                }
            }
        }
    },
    "size": 0
}

Result:

AAA
BBB
CCC
aaa
bbb

Expected (order between cases does not matter):

AAA
aaa
bbb
BBB
CCC

Here is my aggregation_2:

POST test_data/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "names": {
      "terms": {
        "field": "name@String.values.translations.0.sortword",
        "order": { "_key": "asc" }
      }
    }
  }, 
  "size": 0
}

Result:

aaa,
bbb,
ccc

I have tried to do something with pipelines, like sort by sortword and term keyword, but have no success.

If you want to get all the results, why are you not using a search (instead of the aggregation) and sort on the lowercased field?

Hello, xeraa! Thanks for reply. Indeed I have much more complicated model and I'm using search also, when I have to show all documents. And sorting by normalized keyword works perfectly! But I must show aggregations by specific field also. That's why I have this short example.

My main point is insensitive sorting for search and aggregation with saving case of text. Any chance achieve that?

UPD1:

POST test_data/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "sortNames": {
      "terms": {
        "field": "name@String.values.translations.0.sortword",
        "order": { "_key": "asc" }
      },
      "aggs": {
        "names": {
          "terms": {
            "field": "name@String.values.translations.0.keyword"
          }
        }
      }
    }
  }, 
  "size": 0
}

Aggregation above looks like temporary solution with reading values from subaggregation

"sortNames" : {
    "doc_count_error_upper_bound" : 0,
    "sum_other_doc_count" : 0,
    "buckets" : [
    {
        "key" : "aaa",
        "doc_count" : 2,
        "names" : {
        "doc_count_error_upper_bound" : 0,
        "sum_other_doc_count" : 0,
        "buckets" : [
            {
            "key" : "AAA",
            "doc_count" : 1
            },
            {
            "key" : "aaa",
            "doc_count" : 1
            }
        ]
        }
    },
    {
        "key" : "bbb",
        "doc_count" : 3,
        "names" : {
        "doc_count_error_upper_bound" : 0,
        "sum_other_doc_count" : 0,
        "buckets" : [
            {
            "key" : "BBB",
            "doc_count" : 1
            },
            {
            "key" : "BBb",
            "doc_count" : 1
            },
            {
            "key" : "bbb",
            "doc_count" : 1
            }
        ]
        }
    },
    {
        "key" : "ccc",
        "doc_count" : 1,
        "names" : {
        "doc_count_error_upper_bound" : 0,
        "sum_other_doc_count" : 0,
        "buckets" : [
            {
            "key" : "CCC",
            "doc_count" : 1
            }
        ]
        }
    }
    ]
}

Not sure that is clear enough to fully understand the tradeoffs here, but maybe collapse is what you're after if you want all the individual results / documents? Collapse search results | Elasticsearch Guide [8.11] | Elastic

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.