Hi all,
I have an index that was created with the following configuration:
{
"settings": {
"analysis": {
"analyzer": {
"std_asciifolding": {
"tokenizer": "standard",
"filter": [ "std_asciifold_preserve", "lowercase" ]
}
},
"filter": {
"std_asciifold_preserve": {
"type": "asciifolding",
"preserve_original": true
}
}
}
},
"mappings": {
"properties": {
"display_name": {
"type": "text",
"analyzer": "std_asciifolding"
}
}
}
}
The idea is that the field display_name
is searchable in a case- and diacritics-insensitive, to facilitate searching through non-English names.
One of the documents in this index looks like this:
{
"user_id": "XXXXXXX",
"display_name": "Gáo foo",
"avatar_url": null
}
Where the character á
is a lowercase a
followed by a U+0301
(Combining Acute Accent).
When I try searching for this user in my index, the following query pulls up the document:
{
"query": {
"match_phrase_prefix": {
"display_name": {
"query": "Ga"
}
}
}
}
However, the following query does not:
{
"query": {
"match_phrase_prefix": {
"display_name": {
"query": "Gao"
}
}
}
}
This leads to confusing results where at first it looks like I can search for my user without the combining acute accent, but as I finish typing their display name it's suddenly not there anymore.
Am I missing something here?