How to write a collation rule for icu_collation_keyword field, with alphabets having atmost precedence?

Instead of using alternative locale option, I want to write a rules parameter to customise the sort behaviour with alphabets having atmost precedence.

for the text values, $1232, Abi, £7232, 87343, Karthik

I want the collation rule, with the following sort order

Abi
Karthik
$1232
£7232
87343

So far i have tried with the following mapping

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "name": {   
        "type": "text",
        "fields": {
          "sort": {  
            "type": "icu_collation_keyword",
            "index": false,
            "language" : "en"
          }
        }
      }
    }
  }
}

the above locale option (language) instead of rules option, sorts in the following order

$1232
£7232
87343
Abi
Karthik

Can someone please let me know how to write a collation rule, so that the alphabets have the precedence above all other

alphabets -> space -> punctuation -> symbol -> currency -> digits -> others
Thanks in advance.

Update. Hi @all tried using the following collation rule in the mapping as it was working with java collator code snippet.

"< a,A < b,B < c,C < d,D < e,E < f,F < g,G < h,H < i,I < j,J < k,K < l,L < m,M < n,N < o,O < p,P < q,Q < r,R < s,S < t,T< u,U < v,V < w,W < x,X < y,Y < z,Z"

But using same in the elasticsearch mapping throwing a parse exception

Mapping

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "name": {   
        "type": "text",
        "fields": {
          "sort": {  
            "type": "icu_collation_keyword",
            "index": false,
            "rules" :   "< a,A < b,B < c,C < d,D < e,E < f,F < g,G < h,H < i,I < j,J < k,K < l,L < m,M < n,N < o,O < p,P < q,Q < r,R < s,S < t,T< u,U < v,V < w,W < x,X < y,Y < z,Z"
          }
        }
      }
    }
  }
}

Exception

{
  "error" : {
    "root_cause" : [
      {
        "type" : "mapper_parsing_exception",
        "reason" : "Failed to parse mapping [_doc]: Failed to parse collation rules"
      }
    ],
    "type" : "mapper_parsing_exception",
    "reason" : "Failed to parse mapping [_doc]: Failed to parse collation rules",
    "caused_by" : {
      "type" : "illegal_argument_exception",
      "reason" : "Failed to parse collation rules",
      "caused_by" : {
        "type" : "parse_exception",
        "reason" : "expected a reset or setting or comment at index 0 near \"!< a,A < b,B < c\""
      }
    }
  },
  "status" : 400
}

Can someone please tell me what i am doing wrong here?

I found the reason why there was a parse exception, the one that was working fine for collation rules is the method of java.text package, but the one that elasticsearch is using is com.ibm package. Can someone please tell me what is the ibm package equivalent of the above rule ("< a,A < b,B < c,C < d,D < e,E < f,F < g,G < h,H < i,I < j,J < k,K < l,L < m,M < n,N < o,O < p,P < q,Q < r,R < s,S < t,T< u,U < v,V < w,W < x,X < y,Y < z,Z")

Update. Hi @all The following collation rule setting working for the use case.

PUT my-index-000001
{
  "mappings": {
    "properties": {
      "name": {   
        "type": "text",
        "fields": {
          "sort": {  
            "type": "icu_collation_keyword",
            "index": false,
            "rules" :   "[reorder Latn digit space punct currency others]",
            "alternate" : "shifted"
          }
        }
      }
    }
  }
}

Thank you! The case can be closed.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.