Elasticsearch version (bin/elasticsearch --version
): 5.3
JVM version (java -version
): java 10.0.1 2018-04-17
Description of the problem including expected versus actual behavior:
Hello,
I am trying to achieve case insensitive sort using lowercase analyzer and keyword field. However, the result is not as expected. I am hesitant to use normalizers because of it being an experimental feature. I've tried using default analyzer and also explicitly setting standard
analyzer. There was a similar issue https://github.com/elastic/elasticsearch/issues/22410, but it wasn't much help. Can you please help clarify if what I expect in the results is the correct ES behavior? if not, how can I correctly achieve case insensitive sorting?
Appreciate any help.
Steps to reproduce:
- Create Index
PUT http://localhost:9200/testindex
{
"settings": {
"analysis": {
"analyzer": {
"case_insensitive": {
"tokenizer": "lowercase"
}
}
}
}
}
- Mapping
PUT http://localhost:9200/testindex/_mapping/testmapping
{
"properties": {
"Id": {
"type": "keyword"
},
"Name": {
"type": "text",
"analyzer": "case_insensitive",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
- Documents
PUT http://localhost:9200/testindex/testmapping/1
{
"Id": 1,
"Name": "III-bbb"
}
PUT http://localhost:9200/testindex/testmapping/8
{
"Id": 8,
"Name": "III-ccc"
}
PUT http://localhost:9200/testindex/testmapping/2
{
"Id": 2,
"Name": "iii-aaa"
}
- Search query
POST http://localhost:9200/testindex/testmapping/_search
{
"query": {
"match": {
"Name": "iii"
}
},
"sort": [
{
"Name.keyword": {
"order": "asc"
}
}]
}
- Actual search result
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": null,
"hits": [
{
"_index": "testindex",
"_type": "testmapping",
"_id": "1",
"_score": null,
"_source": {
"Id": 1,
"Name": "III-bbb"
},
"sort": [
"III-bbb"
]
},
{
"_index": "testindex",
"_type": "testmapping",
"_id": "8",
"_score": null,
"_source": {
"Id": 8,
"Name": "III-ccc"
},
"sort": [
"III-ccc"
]
},
{
"_index": "testindex",
"_type": "testmapping",
"_id": "2",
"_score": null,
"_source": {
"Id": 2,
"Name": "iii-aaa"
},
"sort": [
"iii-aaa"
]
}
]
}
}
- Expected result:
Order: Id: 2, Id: 1, & Id: 8
Based on the _analyze
API result the text is being tokenized in lowercase as follow, so the results should be ordered alphabetically and not lexicographically
POST http://localhost:9200/testindex/_analyze
{
"field": "testmapping.Name",
"text": "IIII-bbb"
}
Result:
{
"tokens": [
{
"token": "iiii",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "bbb",
"start_offset": 5,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 1
}
]
}