Sort not working as expected

Hi,
I have an index with Text field along with the corresponding Keyword field.
Index uses custom analyser with Keyword tokenizer using lowercase filter.

this is the index structure -

{
  "settings": {
    "analysis": {
      "analyzer": {
        "CustomTokenizer": {
          "filter": [
            "lowercase"
          ],
          "tokenizer": "keyword",
          "type": "custom"
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "Name": {
        "type": "text",
        "analyzer": "CustomTokenizer",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "Description": {
        "type": "text",
        "analyzer": "CustomTokenizer",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "CreationUser": {
        "type": "text",
        "analyzer": "CustomTokenizer",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      },
      "CreationDate": {
        "type": "text",
        "analyzer": "CustomTokenizer",
        "fields": {
          "keyword": {
            "type": "keyword",
            "ignore_above": 256
          }
        }
      }	
	}
  }
}

These are the values of the name field in the index -

TESTDOC-000001-C
T-RT-001
T-RT-002
TESTDOC-000001-B
T-RT-003
TESTDOC-000001-D

When i sort using Name field asc it gives me this order

GET default_index001/_search
{
  "query": {
    "Match_all": {
    }
  }, 
  "sort": [
    {
      "Name.keyword": {
        "order": "asc"
      }
    }
  ],
  "size": 100
}

when i execute above query it returns the results in this oder.

T-RT-001
T-RT-002
T-RT-003
TESTDOC-000001-B
TESTDOC-000001-C
TESTDOC-000001-D

but the same sort using SQL on Database table gives me this order -

TESTDOC-000001-B
TESTDOC-000001-C
TESTDOC-000001-D
T-RT-001
T-RT-002
T-RT-003

how do i make elastic to return the oder same as database?

Hi there @gjahagir - it looks to me from this example that you want the - character considered during sorting. You can extend your custom analyzer to handle the - character using a character filter.

Thanks @Kathleen_DeRusso i realized that sort order is controlled by database collation setting. For example if the collation setting is set to SQL_Latin1_General_CP1_CI_AS the sort order is this way -
TESTDOC-000001-B
TESTDOC-000001-C
TESTDOC-000001-D
T-RT-001
T-RT-002
T-RT-003

if the collation setting is Latin1_General_BIN then the sort order is like this -
T-RT-001
T-RT-002
T-RT-003
TESTDOC-000001-B
TESTDOC-000001-C
TESTDOC-000001-D

same as Elasticsearch.

It looks like special characters are treated differently by database collation setting.

Is there any such option in Elastic?

We don't support this natively out of the box, but you may be able to do what you're looking for via script based sorting if my previous suggestion on char filters doesn't work for you.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.