Following the article here You have an accent https://www.elastic.co/guide/en/elasticsearch/guide/current/asciifolding-token-filter.html I added the following analysis to my index:
PUT /blog
{
"settings": {
"analysis": {
"analyzer": {
"folding": {
"tokenizer": "standard",
"filter": [ "lowercase", "asciifolding" ]
}
}
}
}
}
and according to the article when I test the analysis out like this:
GET /my_index?analyzer=folding
My œsophagus caused a débâcle
should yield this:
my, oesophagus, caused, a, debacle
But this is not what I get. Instead I get the following output:
{
"tokens": [
{
"token": "my",
"start_offset": 0,
"end_offset": 2,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "sophagus",
"start_offset": 4,
"end_offset": 12,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "caused",
"start_offset": 13,
"end_offset": 19,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "a",
"start_offset": 20,
"end_offset": 21,
"type": "<ALPHANUM>",
"position": 4
},
{
"token": "d",
"start_offset": 22,
"end_offset": 23,
"type": "<ALPHANUM>",
"position": 5
},
{
"token": "b",
"start_offset": 24,
"end_offset": 25,
"type": "<ALPHANUM>",
"position": 6
},
{
"token": "cle",
"start_offset": 26,
"end_offset": 29,
"type": "<ALPHANUM>",
"position": 7
}
]
}
Any idea why the débâcle get's broken down the way it does on my machine?