Is case-insensitive sort on keyword aggregation. ES 7.7.0

Cular · March 24, 2021, 11:12am

I have similar case like here and here2.

I am trying to achive alphabetical sorting (case insensitive) with original value from aggregation.

Simplified example mapping:

PUT test_data
{
    "mappings": {
        "properties": {
            "id": {
                "type": "keyword"
            },
            "name@String": {
                "properties": {
                    "values": {
                        "properties": {
                            "defaultValue": {
                                "type": "text",
                                "fields": {
                                    "keyword": {
                                        "type": "keyword",
                                        "ignore_above": 256
                                    },
                                    "sortword": {
                                        "type": "keyword",
                                        "ignore_above": 256,
                                        "normalizer": "case_insensitive"
                                    }
                                }
                            },
                            "translations": {
                                "properties": {
                                    "0": {
                                        "type": "text",
                                        "fields": {
                                            "keyword": {
                                                "type": "keyword",
                                                "ignore_above": 256
                                            },
                                            "sortword": {
                                                "type": "keyword",
                                                "ignore_above": 256,
                                                "normalizer": "case_insensitive"
                                            }
                                        }
                                    }
                                }
                            }
                        }
                    }
                }
            }
        }
    },
    "settings": {
        "analysis": {
            "normalizer": {
                "case_insensitive": {
                    "filter": [
                        "lowercase"
                    ],
                    "type": "custom"
                }
            }
        }
    }
}

My test data:

POST test_data/_doc/
{
    "name@String": {
        "values": {
            "defaultValue": "CCC",
            "translations": {
                "0": "CCC"
            }
        }
    }
}

POST test_data/_doc/
{
    "name@String": {
        "values": {
            "defaultValue": "bbb",
            "translations": {
                "0": "bbb"
            }
        }
    }
}

POST test_data/_doc/
{
    "name@String": {
        "values": {
            "defaultValue": "BBB",
            "translations": {
                "0": "BBB"
            }
        }
    }
}

POST test_data/_doc/
{
    "name@String": {
        "values": {
            "defaultValue": "aaa",
            "translations": {
                "0": "aaa"
            }
        }
    }
}

POST test_data/_doc/
{
    "name@String": {
        "values": {
            "defaultValue": "AAA",
            "translations": {
                "0": "AAA"
            }
        }
    }
}

Terms aggregation Order paragraph says: Ordering the buckets alphabetically by their terms in an ascending manner.
Here is my aggregation_1:

POST test_data/_search
{
    "query": {
        "match_all": {}
    },
    "aggs": {
        "names": {
            "terms": {
                "field": "name@String.values.translations.0.keyword",
                "order": {
                    "_key": "asc"
                }
            }
        }
    },
    "size": 0
}

Result:

AAA
BBB
CCC
aaa
bbb

Expected (order between cases does not matter):

AAA
aaa
bbb
BBB
CCC

Here is my aggregation_2:

POST test_data/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "names": {
      "terms": {
        "field": "name@String.values.translations.0.sortword",
        "order": { "_key": "asc" }
      }
    }
  }, 
  "size": 0
}

Result:

aaa,
bbb,
ccc

I have tried to do something with pipelines, like sort by sortword and term keyword, but have no success.

xeraa · March 25, 2021, 1:35am

If you want to get all the results, why are you not using a search (instead of the aggregation) and sort on the lowercased field?

Cular · March 25, 2021, 6:12am

Hello, xeraa! Thanks for reply. Indeed I have much more complicated model and I'm using search also, when I have to show all documents. And sorting by normalized keyword works perfectly! But I must show aggregations by specific field also. That's why I have this short example.

My main point is insensitive sorting for search and aggregation with saving case of text. Any chance achieve that?

UPD1:

POST test_data/_search
{
  "query": {
    "match_all": {}
  },
  "aggs": {
    "sortNames": {
      "terms": {
        "field": "name@String.values.translations.0.sortword",
        "order": { "_key": "asc" }
      },
      "aggs": {
        "names": {
          "terms": {
            "field": "name@String.values.translations.0.keyword"
          }
        }
      }
    }
  }, 
  "size": 0
}

Aggregation above looks like temporary solution with reading values from subaggregation

"sortNames" : {
    "doc_count_error_upper_bound" : 0,
    "sum_other_doc_count" : 0,
    "buckets" : [
    {
        "key" : "aaa",
        "doc_count" : 2,
        "names" : {
        "doc_count_error_upper_bound" : 0,
        "sum_other_doc_count" : 0,
        "buckets" : [
            {
            "key" : "AAA",
            "doc_count" : 1
            },
            {
            "key" : "aaa",
            "doc_count" : 1
            }
        ]
        }
    },
    {
        "key" : "bbb",
        "doc_count" : 3,
        "names" : {
        "doc_count_error_upper_bound" : 0,
        "sum_other_doc_count" : 0,
        "buckets" : [
            {
            "key" : "BBB",
            "doc_count" : 1
            },
            {
            "key" : "BBb",
            "doc_count" : 1
            },
            {
            "key" : "bbb",
            "doc_count" : 1
            }
        ]
        }
    },
    {
        "key" : "ccc",
        "doc_count" : 1,
        "names" : {
        "doc_count_error_upper_bound" : 0,
        "sum_other_doc_count" : 0,
        "buckets" : [
            {
            "key" : "CCC",
            "doc_count" : 1
            }
        ]
        }
    }
    ]
}

xeraa · March 26, 2021, 4:44pm

Not sure that is clear enough to fully understand the tradeoffs here, but maybe collapse is what you're after if you want all the individual results / documents? Collapse search results | Elasticsearch Guide [8.11] | Elastic

system · April 23, 2021, 4:45pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Is case-insensitive sort on keyword aggregation possible ? (ES 6.4) Elasticsearch	6	2719	November 16, 2018
Keyword type: aggregation case insensitive Elasticsearch	5	1872	May 19, 2017
Case insensitive sort doesn't work Elasticsearch	9	18635	October 9, 2018
Case insensitive sorting using normalizer Elasticsearch language-clients	4	7405	February 10, 2020
How to do case insensitive sort in ES 5.1+? Elasticsearch	2	2905	January 17, 2017

Is case-insensitive sort on keyword aggregation. ES 7.7.0

Related topics