Synonms Analyzer/Filter does not work - Solved!


(Arun A Nayagam) #1

Hi,

I created a simple synonm analyzer/filter and I was able to apply to the index fine, at least it looks like its applied fine, here's the cluster state extract,

     "sports-2018.07.13": {
                "state": "open",
                "settings": {
                    "index": {
                        "creation_date": "1531510636226",
                        "uuid": "teSeXoRQS4WBkM7ylKzgdw",
                        "analysis": {
                            "filter": {
                                "synonym_filter": {
                                    "type": "synonym",
                                    "synonyms": [
                                        "Manchester United, Man U, Man Utd"
                                    ],
                                    "tokenizer": "keyword"
                                }
                            },
                            "analyzer": {
                                "synonym_analyzer": {
                                    "filter": [
                                        "synonym_filter"
                                    ],
                                    "tokenizer": "keyword"
                                }
                            }
                        },
                        "number_of_replicas": "1",
                        "number_of_shards": "5",
                        "version": {
                            "created": "2040099"
                        }
                    }
                },
                "mappings": {...

I know for sure there are records because I see them in Kibana.

But when I query like this, I am not getting any results back,

{
   "query":{
      "filtered":{
         "query":{
            "multi_match":{
               "query":"Man Utd",
               "fields":[
                  "event.eventName"
               ],
               "analyzer":"synonym_analyzer"
            }
         }
      }
   }
}

Any help is appreciated.

Thanks,
Arun


(Jun Ohtani) #2

Hi,

Could you put your mapping of "event.eventName"?
I think the analyzer of event.eventName is not keyword analyzer/synonym_analyzer.

If the analyzer is standard, "Man Utd" is analyzed every single words, "man" and "utd".


(Arun A Nayagam) #3

Hi Jun,

You are right. I didn't realise you have to set the analyzer of event.eventName to keyword/synonym_analyzer.
I have not set any analyzer explicitly,

"event": {
    "properties": {
        "eventName": {
            "type": "string"
        }

Should I update the mapping with a "keyword" analyzer for that field?

Thanks,
Arun


(Jun Ohtani) #4

Yes, you should use keyword field.
Unfortunately, you cannot update existing field mapping.
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-put-mapping.html#updating-field-mappings

You should reindex your data. using reindex api or index from source.
Note, you have to create index with fields that you want to use synonym_analyzer before you index your data.


(Arun A Nayagam) #5

Hi @johtani ,

Updating the mapping is fine.

Is changing just the type to be keyword, enable me to use the synonym_analyzer?

"event": {
    "properties": {
        "eventName": {
            "type": "keyword"
        }

Please note that the eventName, will be something like this,

Man Utd vs Arsenal

Can I use type keyword on event.eventName and still be able to use the synonym_analyzer to find,
event.eventName: "Manchester United" or event.eventName: "Man U"

Thanks,
Arun


(Jun Ohtani) #6

In your use case, you shouldn't use keyword tokenizer any more.
If use keyword tokenizer, you cannot search "vs" text.
Thee is a good documentation, unfortunately the version is old. But you can understand the concept.
And also another good documentation of inverted index


(Arun A Nayagam) #7

Hi @johtani,

I will definitely go through those links and try understand.
In the meanwhile, I am hoping you understood my requirement, with that in mind, would you be able to suggest any tokenizer that I can use?

Thanks,
Arun


(Jun Ohtani) #8

Unfortunately, I don't understand completely.
I think "standard" or "whitespace" tokenizer is fine.

Also the standard analyzer would be helpful.

And another consideration point is multi-word synonym or not.
See synonym graph token filter.


(Arun A Nayagam) #9

Hi @johtani,

Just so I clarify the requirement.

I have a event.eventName field that can have values like this,

Manchester United vs Chelsea

Arsenal vs Man Utd

Tottenham vs Man U

Unfortunately I am not allowed to change the value during index time.
So I thought I could write a query time synonym analyzer to query all combinations of Manchester United.

Now I am not sure if I require a multi-word synonym.
The more I go through your links, my requirement looks more and more complex :slight_smile:

Thanks,
Arun


(Arun A Nayagam) #10

@johtani Thank you for pointing me in the right direction.

Multi word synonyms and associated issues with Lucene are well documented here.

Just for the benefit of all, here are the steps I had to take to define the dictionary and be able query multi word synonyms using a filter type of "synonym_graph"

  1. Close the index, to define the analyzer,

POST {{server}}/sportsbook-event/_close

  1. Add the analyzer/filter,
PUT {{server}}/sportsbook-event/_settings
{
   "settings":{
      "analysis":{
         "analyzer":{
            "event_name_synonym_analyzer":{
               "tokenizer":"standard",
               "filter":[
                  "lowercase",
                  "event_name_synonym_filter"
               ]
            }
         },
         "filter":{
            "event_name_synonym_filter":{
               "type":"synonym_graph",
               "synonyms":[
                  "manchester united,manchester utd,man u",
                  "new york, ny"
               ]
            }
         }
      }
   }
}
  1. Open the index,

{{server}}/sportsbook-event/_open

  1. An explain of query vs match_phrase is quite interesting and tells you what exactly is being searched,

match_phrase :

GET sportsbook-event/_validate/query?explain

{
  "query": {
    "match_phrase": {
      "event.eventName": {
        "query": "ny",
        "analyzer": "event_name_synonym_analyzer"
      }
    }
  }
}
{
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "valid": true,
  "explanations": [
    {
      "index": "sportsbook-event",
      "valid": true,
      "explanation": "spanOr([spanNear([event.eventName:new, event.eventName:york], 0, true), event.eventName:ny])"
    }
  ]
}

Query:

GET sportsbook-event/_validate/query?explain
{
  "query": {
    "query_string": {
      "default_field": "event.eventName",
      "query": "new york",
      "analyzer": "event_name_synonym_analyzer"
    }
  }
}

{
  "_shards": {
    "total": 1,
    "successful": 1,
    "failed": 0
  },
  "valid": true,
  "explanations": [
    {
      "index": "sportsbook-event",
      "valid": true,
      "explanation": """(event.eventName:ny event.eventName:"new york")"""
    }
  ]
}

Search results seems to be quite accurate.

Thanks,
Arun


(system) #11

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.