French Analyzer

When i use the "french" analyzer on words containing double letter i got as result a work without the double letter. Is it normal? The index is built with the same analyzer but

Request:

{"analyzer":"french","text":["Village"],"tokenizer":"edge_ngram"}

Response:

{"tokens":[{"token":"vilag","start_offset":0,"end_offset":7,"type":"","position":0}]}

If the "italian" analyzer is used the result is

Request:

{"analyzer":"italian","text":["Village"],"tokenizer":"edge_ngram"}

Response:

{"tokens":[{"token":"villag","start_offset":0,"end_offset":7,"type":"","position":0}]}

Please note that tokenizer has no effect here.

Next time, could you share a script like:

POST _analyze
{
  "analyzer": "french",
  "text": [
    "Village"
  ]
}

So it's easier to copy and paste and test.

the french and italian analyzers are coming from Lucene. I can't tell if the behavior is normal or not. But what is the problem with that behavior?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.