[Ask suggestion] How to simplify the synonyms implementaiton?

Youxu · August 24, 2015, 6:05am

we are going to define synonyms with multi-language support in our ES indexes.
Right now, we have totally 10 indexes, all of which need multi-language support, that is, for any field need multi-language index/query, we define multiple fields each of which for a certain language with specific language analyzer, for example, title_en, title_de, etc...,

So, to support synonyms, we have to overwrite the language analyzer to add synonyms support,
e.g,
"english": {
"tokenizer": "standard",
"filter": [
"english_possessive_stemmer",
"lowercase",
"english_stop",
"english_keywords",
"english_stemmer",
"my_synonyms"
]
},
we just copy the language analyzer definition from:
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html
but add "my_synonyms" to the filter array, and the "my_synonyms" is defined as follows:

            "my_synonyms": {
                "type": "synonym",
                "synonyms_path": "analysis/synonyms.txt"
            }

It works, but we have to overwrite all language analyzers in our index schema, and have to repeat them in all indexes.

Is there any way to simplify the synonyms implementation for our case?

loren · August 24, 2015, 4:24pm

I think you will need to overwrite each analyzer separately as you have done, presumably specifying a different synonym file for each language. I noticed you stuck my_synonyms at the end of your analysis chain, after your stemmer. In that case make sure the entries in your synonyms file are stemmed (e.g., intern for internal) or it won't work.

If you know you are going to use the exact same language mappings/settings across your 10 indexes, you can simplify things a bit by defining all of this stuff just once in an index template. If you have an index for authors and an index for books, you could name them lang-authors and lang-books and have a template around the lang-* pattern. Ditto for the fields: you can specify all field names matching *_de use your German analysis chain.

Here is some source code that does this sort of thing, if it helps. Each language has its own custom analysis chain with its own synonyms and protected words, and it's all set up in an index template.

Youxu · August 25, 2015, 4:22am

Thanks Loren!
We tried template and it works and significantly simplify our synonyms implementation.

Topic		Replies	Views
Handling multiple languages Elasticsearch	1	300	July 6, 2017
Indexing for multi-language support Elasticsearch	5	2996	July 5, 2017
Multilingual field handling with multiple fields in ES Elasticsearch	4	1883	July 6, 2017
Correctly set up index analyzer and search analyzer Elasticsearch	3	760	May 29, 2021
What is the proper settings and mapping for multiple languages Elasticsearch	1	705	December 7, 2018

[Ask suggestion] How to simplify the synonyms implementaiton?

Related topics