Correctly set up index analyzer and search analyzer

Guido_Cioni · April 30, 2021, 3:26pm

Hi, I'm trying to set up a simple analyzer to search for question/answers into our elastic search cluster.
We need to have the option to use english language analyzer (with the default stopwords) together with custom synonyms and stopwords. As the latters are going to change from time to time I decided to include them only in the search_anayzer so that I can use updateable=True. Here is my setup (sorry for the weird formatting, I'm using the elasticsearch python dsl)

index_analyzer = analyzer(
    'english_analyzer',
    tokenizer="standard",
    filter=["lowercase",
            token_filter('english_stemmer',
                         type='stemmer',
                         language='english'),
            token_filter('english_excluded_words',
                         type='stop',
                         stopwords='_english_'),
            token_filter('english_possessive_stemmer',
                         type='stemmer',
                         language='possessive_english')
            ])

search_analyzer = analyzer(
    'english_search_analyzer',
    tokenizer="standard",
    filter=["lowercase",
            token_filter('english_stemmer',
                         type='stemmer',
                         language='english'),
            token_filter('english_excluded_words',
                         type='stop',
                         stopwords='_english_'),
            token_filter('english_possessive_stemmer',
                         type='stemmer',
                         language='possessive_english'),
            token_filter('synonyms',
                         type='synonym',
                         synonyms_path="analyzers/F79458176",
                         updateable=True),
            token_filter('custom_stopwords',
                         type='stop',
                         stopwords_path="analyzers/F261792345",
                         updateable=True),
            ])

The paths are defined like this as I'm using packages in AWS. I tested a similar setup without the custom_stopwords and only a bunch of synonyms and it was working just fine. Now I just added all the synonyms and some custom stopwords but I'm getting an error when initializing the index:

RequestError(400, 'illegal_argument_exception', 'failed to build synonyms'): RequestError

without any additional info.

Is it something related to a fail parsing of the synonyms file or to my setup with multiple stopwords token filters? Do I really need to use the first three token filters in the search analyzer even though they are already in the index analyzer?

Guido_Cioni · April 30, 2021, 4:12pm

I tried to remove the stopwords filter and I still get the error, so it has to be related with my synonyms fie.

Here is an excerpt from my synonyms file, basically I'm trying to define acronyms and models with the same name.

A321 Neo,A321Neo,A321N,A321NX
A/C,Aircraft  
A/THR,Autothrust
AAC,Airline Administrative Communication
AAP,Additional Attendant Panel
ABD,Airbus Directive and Procedure  
AC,Aircraft Characteristics,Alternating current,Advisory Circular
ACARS,Aircraft Communication Addressing and Reporting System
ACAS,Airborne Collision Avoidance System
ACB,Attendant Call Button
ACD,Aircraft Control Domain
iCMT,interactive Cabin Management Terminal
iPRAM,Integrated Pre-Recorded Announcement Module

Guido_Cioni · May 1, 2021, 5:52pm

So in the end I found the solution.

The problem was that the stopwords token filter was placed BEFORE the synonyms and, as you can see from the excerpt, some of the synonyms actually contain stopwords inside them. This was causing all sort of errors when initializing the index but unfortunately I could not actually see the error due to the fact that I'm using the Python client.

I resolved everything by placing the synonyms filter at the beginning so that the search analyzer looks like this:

search_analyzer = analyzer(
    'english_search_analyzer',
    tokenizer="standard",
    filter=["lowercase",
            token_filter('synonyms',
                         type='synonym',
                         synonyms_path="analyzers/F79458176",
                         updateable=True),
            token_filter('english_stemmer',
                         type='stemmer',
                         language='english'),
            token_filter('english_excluded_words',
                         type='stop',
                         stopwords='_english_'),
            token_filter('english_possessive_stemmer',
                         type='stemmer',
                         language='possessive_english'),
            token_filter('custom_stopwords',
                         type='stop',
                         stopwords_path="analyzers/F261792345",
                         updateable=True),
            ])

system · May 29, 2021, 5:52pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Specifying Analyzer Elasticsearch	6	369	July 6, 2017
ElasticSearch enable Snowball Analyzer and Synonym on Fields Elasticsearch	2	758	July 6, 2017
Combining language-specific analyzer and synonym token filter Elasticsearch	2	615	July 6, 2017
Synonym Elasticsearch	3	494	July 6, 2017
Elasticsearch synonyms Elasticsearch	2	857	July 6, 2017

Correctly set up index analyzer and search analyzer

Related topics