Using synonym_graph means non-synonyms are not found

Jonathan_Mugan · February 21, 2023, 4:44am

Hi, I have this analyzer

settings = {
    "analysis": {
        "analyzer": {
            "gale_analyzer": {
                "tokenizer": "standard",
                "filter": [
                    "lowercase",
                    "x_synonyms"
                ]
            }
        },
        "filter": {
            "x_synonyms": {
                "type": "synonym_graph",
                "synonyms_path": "dev_elastic_syn.csv",
                "updateable": True
            }
        }
    }
}

and it works great for a word that has a synonym in dev_elastic_syn.csv but for a word that is in a document but not in that file it isn't found. If I do the same search without the analyzer, it is found. Any ideas?

Here's the query

query={
                "match_phrase": {
                    "text": {
                        "query": ngram,
                        #"analyzer": "x_analyzer"
                    }
                }
            }

I have to comment out the analyzer to get it to find words not in dev_elastic_syn.csv.

Christian_Dahlqvist · February 21, 2023, 8:22am

What is the mapping for the indexed field that is queried?

Jonathan_Mugan · February 21, 2023, 5:55pm

I didn't set up an explicit mapping. There is only one other field that is like a serial number. I'll try setting up the mapping to see if that fixes it. Thanks!

Christian_Dahlqvist · February 21, 2023, 6:04pm

The mapping in the index will determine how your field is analysed and which tokens are indexed. This is by default the analyser used when querying so the query string is tokenised the same way as the indexed fields. If you specify a search time analyser you need to ensure the way this tokenises data matches with what you have indexed, or you will not get any matches.

Jonathan_Mugan · February 21, 2023, 6:12pm

Thanks. I'm a little confused because I thought the synonyms were not supposed to be applied at index time, because that takes up too much space or something.

If I map a synonym analyzer then it will apply the synonyms at index time, right?

I remember reading that you could do synonyms at index or at query and it was better to do them at query. I guess I'm worried that if I map the an analyzer with synonyms it will apply it at index.

I'm having trouble keeping track of all of the moving parts. If you don't map, at query time it should apply the synonyms to both the query and each document, right?

This is separate from the "analyzer" vs "query_analyzer" at the mapping. I don't see any reason to specify a different analyzer for querying for my use case. Thanks so much for your help.

Christian_Dahlqvist · February 21, 2023, 6:20pm

I may be wrong. To be sure I would recommend using the analyze API to see how the indexed field is tokenised (check the mappings) and compare this when running the query string through the analyser that includes the synonym.

Jonathan_Mugan · February 21, 2023, 6:23pm

Thanks. Yeah, I tried the analyzer API and everything seemed to be working fine.

Maybe I will try the mapping with the synonyms at index time (if that's how it works) and see if that helps. I've got plenty of space. If I need to update the synonym list I can just re-index. Thanks!

Jonathan_Mugan · February 21, 2023, 6:24pm

I'll report back and let you know if the matching fixed my problems. Thanks again!

Jonathan_Mugan · February 22, 2023, 4:20am

Yep, that worked, thanks!

system · March 22, 2023, 4:21am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to search with synonym analyzer Elasticsearch	4	2495	December 29, 2016
Synonym search analyzer not updating after update synonyms.txt? Elasticsearch	1	771	November 6, 2018
ElasticSearch synonym and word delimiter analyzer are not compatible Elasticsearch	5	2342	July 6, 2017
Analyser not found Elasticsearch	11	2700	July 18, 2019
Specifying Analyzer Elasticsearch	6	347	July 6, 2017

Using synonym_graph means non-synonyms are not found

Related topics