[Ask suggestion] How to simplify the synonyms implementaiton?


(Xudong You) #1

we are going to define synonyms with multi-language support in our ES indexes.
Right now, we have totally 10 indexes, all of which need multi-language support, that is, for any field need multi-language index/query, we define multiple fields each of which for a certain language with specific language analyzer, for example, title_en, title_de, etc...,

So, to support synonyms, we have to overwrite the language analyzer to add synonyms support,
e.g,
"english": {
"tokenizer": "standard",
"filter": [
"english_possessive_stemmer",
"lowercase",
"english_stop",
"english_keywords",
"english_stemmer",
"my_synonyms"
]
},
we just copy the language analyzer definition from:
https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html
but add "my_synonyms" to the filter array, and the "my_synonyms" is defined as follows:

            "my_synonyms": {
                "type": "synonym",
                "synonyms_path": "analysis/synonyms.txt"
            }

It works, but we have to overwrite all language analyzers in our index schema, and have to repeat them in all indexes.

Is there any way to simplify the synonyms implementation for our case?


(Loren Siebert) #2

I think you will need to overwrite each analyzer separately as you have done, presumably specifying a different synonym file for each language. I noticed you stuck my_synonyms at the end of your analysis chain, after your stemmer. In that case make sure the entries in your synonyms file are stemmed (e.g., intern for internal) or it won't work.

If you know you are going to use the exact same language mappings/settings across your 10 indexes, you can simplify things a bit by defining all of this stuff just once in an index template. If you have an index for authors and an index for books, you could name them lang-authors and lang-books and have a template around the lang-* pattern. Ditto for the fields: you can specify all field names matching *_de use your German analysis chain.

Here is some source code that does this sort of thing, if it helps. Each language has its own custom analysis chain with its own synonyms and protected words, and it's all set up in an index template.


(Xudong You) #3

Thanks Loren!
We tried template and it works and significantly simplify our synonyms implementation.


(system) #4