French Analyzer

gdelgiovine · March 13, 2019, 2:21pm

When i use the "french" analyzer on words containing double letter i got as result a work without the double letter. Is it normal? The index is built with the same analyzer but

Request:

{"analyzer":"french","text":["Village"],"tokenizer":"edge_ngram"}

Response:

{"tokens":[{"token":"vilag","start_offset":0,"end_offset":7,"type":"","position":0}]}

If the "italian" analyzer is used the result is

Request:

{"analyzer":"italian","text":["Village"],"tokenizer":"edge_ngram"}

Response:

{"tokens":[{"token":"villag","start_offset":0,"end_offset":7,"type":"","position":0}]}

dadoonet · March 13, 2019, 6:24pm

Please note that tokenizer has no effect here.

Next time, could you share a script like:

POST _analyze
{
  "analyzer": "french",
  "text": [
    "Village"
  ]
}

So it's easier to copy and paste and test.

the french and italian analyzers are coming from Lucene. I can't tell if the behavior is normal or not. But what is the problem with that behavior?

system · April 10, 2019, 6:24pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Custom analyzer registered but not used Elasticsearch	1	370	July 6, 2017
Configured custom analyzer registered but not used while indexing Elasticsearch	1	408	July 6, 2017
Stemmer token filter result is different that it should be Elasticsearch	2	387	July 6, 2017
How do language analyzers work? Elasticsearch	3	379	July 6, 2017
Analyzer, mapping et apostrophe Discussions en français	6	2454	July 6, 2017

French Analyzer

Request:

Response:

Request:

Response:

Related topics