Issue with asciiFolding filter and accents

jb95 · September 30, 2015, 3:19pm

Hi,

i have an issue with the how asciiFolding filter works ...

I explain :

I have an analyzer

"folding": {
       "tokenizer": "standard",
      "filter":  ["asciifolding" ]
 }

I thought (in french) the tokens for pate and pâte will be the same --> pate without accent

But no

GET /cac/_analyze?analyzer=folding&text=pate
{
   "tokens": [
      {
         "token": "pate",
         "start_offset": 0,
         "end_offset": 4,
         "type": "<ALPHANUM>",
         "position": 1
      }
   ]
}

AND

GET /cac/_analyze?analyzer=folding&text=pâte
{
   "tokens": [
      {
         "token": "p",
         "start_offset": 0,
         "end_offset": 1,
         "type": "<ALPHANUM>",
         "position": 1
      },
      {
         "token": "te",
         "start_offset": 2,
         "end_offset": 4,
         "type": "<ALPHANUM>",
         "position": 2
      }
   ]
}

Why i hav two tokens with the second word with accent ? I have searched a lot but nothing all my tests are bad !

Thank you for your help !

dadoonet · September 30, 2015, 3:49pm

Well. Be careful with the tool you are using to send those tests.

It must be sent in UTF-8 otherwise the standard analyzer might produce bad results.

For example, on my french laptop with curl, I get:

curl -XGET "http://localhost:9200/_analyze?tokenizer=standard&text=pâte&pretty"

{
  "tokens" : [ {
    "token" : "pￃﾢte",
    "start_offset" : 0,
    "end_offset" : 5,
    "type" : "<ALPHANUM>",
    "position" : 1
  } ]
}

jb95 · September 30, 2015, 3:52pm

yes no soucy i used tools with utf-8

Topic		Replies	Views
Problems with Ascii Folding text with Accents Elasticsearch	4	1647	July 5, 2017
Index analyzer problem with accent! Elasticsearch	1	337	July 6, 2017
Word with accent and searching Elasticsearch	5	1107	July 6, 2017
Question about asciifolding filter Elasticsearch	3	549	July 6, 2017
Getting Accented Text Indexed Properly Elasticsearch	4	1105	July 5, 2017

Issue with asciiFolding filter and accents

Related topics