Correctly set up index analyzer and search analyzer

Hi, I'm trying to set up a simple analyzer to search for question/answers into our elastic search cluster.
We need to have the option to use english language analyzer (with the default stopwords) together with custom synonyms and stopwords. As the latters are going to change from time to time I decided to include them only in the search_anayzer so that I can use updateable=True. Here is my setup (sorry for the weird formatting, I'm using the elasticsearch python dsl)

index_analyzer = analyzer(
    'english_analyzer',
    tokenizer="standard",
    filter=["lowercase",
            token_filter('english_stemmer',
                         type='stemmer',
                         language='english'),
            token_filter('english_excluded_words',
                         type='stop',
                         stopwords='_english_'),
            token_filter('english_possessive_stemmer',
                         type='stemmer',
                         language='possessive_english')
            ])

search_analyzer = analyzer(
    'english_search_analyzer',
    tokenizer="standard",
    filter=["lowercase",
            token_filter('english_stemmer',
                         type='stemmer',
                         language='english'),
            token_filter('english_excluded_words',
                         type='stop',
                         stopwords='_english_'),
            token_filter('english_possessive_stemmer',
                         type='stemmer',
                         language='possessive_english'),
            token_filter('synonyms',
                         type='synonym',
                         synonyms_path="analyzers/F79458176",
                         updateable=True),
            token_filter('custom_stopwords',
                         type='stop',
                         stopwords_path="analyzers/F261792345",
                         updateable=True),
            ])

The paths are defined like this as I'm using packages in AWS. I tested a similar setup without the custom_stopwords and only a bunch of synonyms and it was working just fine. Now I just added all the synonyms and some custom stopwords but I'm getting an error when initializing the index:

RequestError(400, 'illegal_argument_exception', 'failed to build synonyms'): RequestError

without any additional info.

Is it something related to a fail parsing of the synonyms file or to my setup with multiple stopwords token filters? Do I really need to use the first three token filters in the search analyzer even though they are already in the index analyzer?

I tried to remove the stopwords filter and I still get the error, so it has to be related with my synonyms fie.

Here is an excerpt from my synonyms file, basically I'm trying to define acronyms and models with the same name.

A321 Neo,A321Neo,A321N,A321NX
A/C,Aircraft  
A/THR,Autothrust
AAC,Airline Administrative Communication
AAP,Additional Attendant Panel
ABD,Airbus Directive and Procedure  
AC,Aircraft Characteristics,Alternating current,Advisory Circular
ACARS,Aircraft Communication Addressing and Reporting System
ACAS,Airborne Collision Avoidance System
ACB,Attendant Call Button
ACD,Aircraft Control Domain
iCMT,interactive Cabin Management Terminal
iPRAM,Integrated Pre-Recorded Announcement Module 

So in the end I found the solution.

The problem was that the stopwords token filter was placed BEFORE the synonyms and, as you can see from the excerpt, some of the synonyms actually contain stopwords inside them. This was causing all sort of errors when initializing the index but unfortunately I could not actually see the error due to the fact that I'm using the Python client.

I resolved everything by placing the synonyms filter at the beginning so that the search analyzer looks like this:

search_analyzer = analyzer(
    'english_search_analyzer',
    tokenizer="standard",
    filter=["lowercase",
            token_filter('synonyms',
                         type='synonym',
                         synonyms_path="analyzers/F79458176",
                         updateable=True),
            token_filter('english_stemmer',
                         type='stemmer',
                         language='english'),
            token_filter('english_excluded_words',
                         type='stop',
                         stopwords='_english_'),
            token_filter('english_possessive_stemmer',
                         type='stemmer',
                         language='possessive_english'),
            token_filter('custom_stopwords',
                         type='stop',
                         stopwords_path="analyzers/F261792345",
                         updateable=True),
            ])

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.