Using synonym_graph means non-synonyms are not found

Hi, I have this analyzer

settings = {
    "analysis": {
        "analyzer": {
            "gale_analyzer": {
                "tokenizer": "standard",
                "filter": [
                    "lowercase",
                    "x_synonyms"
                ]
            }
        },
        "filter": {
            "x_synonyms": {
                "type": "synonym_graph",
                "synonyms_path": "dev_elastic_syn.csv",
                "updateable": True
            }
        }
    }
}

and it works great for a word that has a synonym in dev_elastic_syn.csv but for a word that is in a document but not in that file it isn't found. If I do the same search without the analyzer, it is found. Any ideas?

Here's the query

query={
                "match_phrase": {
                    "text": {
                        "query": ngram,
                        #"analyzer": "x_analyzer"
                    }
                }
            }

I have to comment out the analyzer to get it to find words not in dev_elastic_syn.csv.

What is the mapping for the indexed field that is queried?

I didn't set up an explicit mapping. There is only one other field that is like a serial number. I'll try setting up the mapping to see if that fixes it. Thanks!

The mapping in the index will determine how your field is analysed and which tokens are indexed. This is by default the analyser used when querying so the query string is tokenised the same way as the indexed fields. If you specify a search time analyser you need to ensure the way this tokenises data matches with what you have indexed, or you will not get any matches.

Thanks. I'm a little confused because I thought the synonyms were not supposed to be applied at index time, because that takes up too much space or something.

If I map a synonym analyzer then it will apply the synonyms at index time, right?

I remember reading that you could do synonyms at index or at query and it was better to do them at query. I guess I'm worried that if I map the an analyzer with synonyms it will apply it at index.

I'm having trouble keeping track of all of the moving parts. If you don't map, at query time it should apply the synonyms to both the query and each document, right?

This is separate from the "analyzer" vs "query_analyzer" at the mapping. I don't see any reason to specify a different analyzer for querying for my use case. Thanks so much for your help.

I may be wrong. To be sure I would recommend using the analyze API to see how the indexed field is tokenised (check the mappings) and compare this when running the query string through the analyser that includes the synonym.

Thanks. Yeah, I tried the analyzer API and everything seemed to be working fine.

Maybe I will try the mapping with the synonyms at index time (if that's how it works) and see if that helps. I've got plenty of space. If I need to update the synonym list I can just re-index. Thanks!

I'll report back and let you know if the matching fixed my problems. Thanks again!

Yep, that worked, thanks!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.