Is it possible to sort in a custom grouping manner with unicode collation algorithm in elasticsearch?

Karthik_Amar · March 8, 2023, 5:11pm

I am working on a phonebook, where if the user does not provide Name but fills only email, I will show the email value in phonebook (as in mac contacts). And the priority is as follows

Name (if not present) -> Email (if not present) -> companyname (if not present) -> phone

The phonebook sorting is required to be in the following order, where

The alphabets should appear first (can be name, email, company name (punctuation ignored))
Emails starting with numbers
Currency Symbols
Name and Company name starting with numbers
Phone Numbers

For example, phonebook should list contacts in the following order

Aaron
@home furnitures
Karen
laos@gmail.com
Rahon
Zaro

$1000
£934

1TouchFurnitures
21st century
6Bianca Corps

+1 (232) 323-844
+6 (983) 341-093
91 9600946491

I am currently using the following mapping in elasticsearch with sort order rules (rules) in a single field just for sorting purpose (sortKey) with icu_collation_keyword type

    {
      "mappings": {
        "properties": {
          "sortKey": {   
            "type": "text",
            "fields": {
              "sort": {  
                "type": "icu_collation_keyword",
                "index": false,
                "rules" :   "[reorder Latn digit currency symbol space punct others]",
                "alternate" : "shifted"
              }
            }
          }
        }
      }
    }

I am getting the following sort order while query in name.sort field

Aaron
@home furnitures
Karen
laos@gmail.com
Rahon
Zaro

1geni@gmail.com
1TouchFurnitures
2mud@gmail.com
21st century
6Bianca Corps
91 9600946491
99voli@gmail.com

$1000
£934

+1 (232) 323-844
+6 (983) 341-093

I can see why the sorting is in the above order, because of the rule reordering -> Latn, digit, currency, symbol, space, punct, others. I can understand with collation algorithm each character is given sort key based on collation rule we have defined.

Apart from rule settings option which we have used above, I know we can write a rule chain as an alternate for collated sorting, Can someone please tell me is it possible to flex the unicode collation algorithm to match the above use case.

Note: Keeping multiple elasticsearch fields for each grouping (just for sorting purpose) and based on the number of records fetched with limit, we can chain multiple queries for cursor mechanism. But I dont think its a proper way of going about it, and looking to see if it can be achieved with just one field.

Thank you so much for taking time to read this. Any help is much appreciated.

system · April 5, 2023, 5:12pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to customise ICU Collation Keyword Field for sorting digits, symbols at last after the alphabets? Elasticsearch	1	268	April 2, 2023
Alphabetic sorting strategies Elasticsearch	4	1074	July 6, 2017
Getting Hash value in sort response, ES json response is invalid Elasticsearch	1	423	February 3, 2021
Unexpected Behavior with ICU Collation Keyword Sorting Elastic Search	1	21	December 9, 2024
Sort Alphanumeric Strings in Ascending or Descending Order Elasticsearch	2	266	March 30, 2024

Is it possible to sort in a custom grouping manner with unicode collation algorithm in elasticsearch?

Related topics