Make search with special characters insensitive

Hello, firstly sorry if this is wrong section and I am beginner, but I need help with search query for Elasticsearch.

I use Elasticsearch as search engine for my website. People can search for products and as you can see below I show products based on fields like status, datetime_publish_start, cat_path and metahash. I know that this script is very non effective, but I don't know how to make it better, I have no skills yet.

Search logic looks like this:
User write for example kufor modrý into search, then I split this sentence into keywords by whitespaces, so I will get keywords like kufor and modrý, and then I search these keywords against field metahash.meta.alternative_name (result must contain all keywords) and I will check field product_number too in case user wrote product number.

But problem is that, if user write for example kufor modry (without special character "ý"), then I split it into kufor and modry and I find nothing, because data in metahash.meta.alternative_name are saved with special characters. So I really don't know how to make search with special characters insensitive.

This is query for Elasticsearch in JSON format what I use:

{  
    "query":{  
        "constant_score":{  
            "filter":{  
                "bool":{  
                    "must":[  
                        {  
                            "term":{  
                                "status":"y"
                            }
                        },
                        {  
                            "range":{  
                                "datetime_publish_start":{  
                                    "lte":"now/h"
                                }
                            }
                        },
                        {  
                            "term":{  
                                "cat_path":30
                            }
                        }
                    ],
                    "must_not":[  
                        {  
                            "term":{  
                                "metahash.meta.sold_out":"y"
                            }
                        }
                    ],
                    "should":[  
                        {  
                            "bool":{  
                                "must":[  
                                    {  
                                        "wildcard":{  
                                            "metahash.meta.alternative_name":"*kufor*"
                                        }
                                    },
                                    {  
                                        "wildcard":{  
                                            "metahash.meta.alternative_name":"*modrý*"
                                        }
                                    }
                                ]
                            }
                        },
                        {  
                            "term":{  
                                "product_number":"kufor modrý"
                            }
                        }
                    ]
                }
            }
        }
    },
    "sort":[  
        {  
            "ID_entity":{  
                "order":"desc"
            }
        }
    ]
}

Thank you very much for any help.

Could you provide a full recreation script as described in About the Elasticsearch category. It will help to better understand what you are doing. Please, try to keep the example as simple as possible.

A full reproduction script will help readers to understand, reproduce and if needed fix your problem. It will also most likely help to get a faster answer.

You should fold metahash.meta.alternative_name to ASCII characters:

https://www.elastic.co/guide/en/elasticsearch/reference/7.2/analysis-asciifolding-tokenfilter.html
https://www.elastic.co/guide/en/elasticsearch/plugins/current/analysis-icu-folding.html

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.