Hi,
i have an issue with the how asciiFolding filter works ...
I explain :
I have an analyzer
"folding": {
"tokenizer": "standard",
"filter": ["asciifolding" ]
}
I thought (in french) the tokens for pate and pâte will be the same --> pate without accent
But no
GET /cac/_analyze?analyzer=folding&text=pate
{
"tokens": [
{
"token": "pate",
"start_offset": 0,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 1
}
]
}
AND
GET /cac/_analyze?analyzer=folding&text=pâte
{
"tokens": [
{
"token": "p",
"start_offset": 0,
"end_offset": 1,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "te",
"start_offset": 2,
"end_offset": 4,
"type": "<ALPHANUM>",
"position": 2
}
]
}
Why i hav two tokens with the second word with accent ? I have searched a lot but nothing all my tests are bad !
Thank you for your help !