I am working on a phonebook, where if the user does not provide Name but fills only email, I will show the email value in phonebook (as in mac contacts). And the priority is as follows
Name (if not present) -> Email (if not present) -> companyname (if not present) -> phone
The phonebook sorting is required to be in the following order, where
- The alphabets should appear first (can be name, email, company name (punctuation ignored))
- Emails starting with numbers
- Currency Symbols
- Name and Company name starting with numbers
- Phone Numbers
For example, phonebook should list contacts in the following order
- Aaron
- @home furnitures
- Karen
- laos@gmail.com
- Rahon
- Zaro
- $1000
- £934
- 1TouchFurnitures
- 21st century
- 6Bianca Corps
- +1 (232) 323-844
- +6 (983) 341-093
- 91 9600946491
I am currently using the following mapping in elasticsearch with sort order rules (rules) in a single field just for sorting purpose (sortKey) with icu_collation_keyword type
{
"mappings": {
"properties": {
"sortKey": {
"type": "text",
"fields": {
"sort": {
"type": "icu_collation_keyword",
"index": false,
"rules" : "[reorder Latn digit currency symbol space punct others]",
"alternate" : "shifted"
}
}
}
}
}
}
I am getting the following sort order while query in name.sort field
- Aaron
- @home furnitures
- Karen
- laos@gmail.com
- Rahon
- Zaro
- 1geni@gmail.com
- 1TouchFurnitures
- 2mud@gmail.com
- 21st century
- 6Bianca Corps
- 91 9600946491
- 99voli@gmail.com
- $1000
- £934
- +1 (232) 323-844
- +6 (983) 341-093
I can see why the sorting is in the above order, because of the rule reordering -> Latn, digit, currency, symbol, space, punct, others. I can understand with collation algorithm each character is given sort key based on collation rule we have defined.
Apart from rule settings option which we have used above, I know we can write a rule chain as an alternate for collated sorting, Can someone please tell me is it possible to flex the unicode collation algorithm to match the above use case.
Note: Keeping multiple elasticsearch fields for each grouping (just for sorting purpose) and based on the number of records fetched with limit, we can chain multiple queries for cursor mechanism. But I dont think its a proper way of going about it, and looking to see if it can be achieved with just one field.
Thank you so much for taking time to read this. Any help is much appreciated.