Issues while search results having special charecters

NSGokul · July 8, 2024, 9:37am

I'm working on an Elasticsearch application for searching person data, including names in both English and French. I've completed the data indexing process.

Current Issue:

Case Sensitivity: Searching for "Francois" doesn't match documents containing "françois" or "François."
Special Characters: Names with accents (e.g., François) don't match searches without them (e.g., francois).
Removing ASCII Characters: My current approach of removing ASCII characters during search hinders accurate matching for French names.

Desired Outcome:

I want to achieve case-insensitive and special character-insensitive matching for French names in my Elasticsearch search. This means:

Searching for "francois" should match documents containing "Francois," "françois," and potentially variations like "francois" (depending on the approach).
Accents and other relevant special characters in French names should not affect search results.

Is there a way to achieve this?

dadoonet · July 8, 2024, 9:51am

Welcome.

Have a look at Language analyzers | Elasticsearch Guide [8.14] | Elastic

That will help you to build this.

NSGokul · July 8, 2024, 10:13am

Hi @dadoonet
I've tried analyzers . The following is the analyzer I am using

**filters**
 "french_stop": {
              "type": "stop",
              "stopwords": "_french_"
            },
             "french_stemmer": {
              "name": "french",
              "type": "stemmer"
            },
        "french_elision": {
              "type": "elision",
              "articles": [
                "l",
                "m",
                "t",
                "qu",
                "n",
                "s",
                "j",
                "d",
                "c",
                "jusqu",
                "quoiqu",
                "lorsqu",
                "puisqu"
              ],
              "articles_case": "true"
            },
**Analyzer**
 "french": {
              "filter": [
                "lowercase",
                "french_stop",
                "french_stemmer",
                "french_elision"
              ],
              "tokenizer": "standard"
            }
          }

and I will share the result from the analyze api

This one not cover my expected results ->
If the input is francois then I want to match the records which contains françois as an exact match (right now its working with fuzziness).
Is there anything I am missing in my analyzers?

dadoonet · July 8, 2024, 11:04am

You need to add an asciifolding token filter as well.

And the test what happens when you analyze françois with the french analyzer.

Topic		Replies	Views
Accent insensitive search with search analyzer Elasticsearch	8	12043	January 30, 2018
Match queries and ASCII folding Elasticsearch	2	393	December 20, 2022
Case-insensitive search of a substring in many (or all) fields Elasticsearch	8	4109	May 11, 2020
Language specific analyzer in the query Elasticsearch	3	412	July 6, 2017
Accent-insensitive search Elasticsearch	2	6091	March 9, 2020

Issues while search results having special charecters

Related topics