Synonym Problem

RobBob · February 10, 2017, 6:04pm

Hello,

I am running into an issue with a synonym pair I have set up. I have the synonyms, "bar,pub" and while returning results from about 3000 different categories one of the two will not appear given a direct search query.

Below are example results for each query:

And here is my current configuration. Any suggestions on what I could do to make sure bar will appear in the first 10 results and pub would return in the first 10 results when queried for?

curl -XPUT http://localhost:9200/categories -d '{
"settings": {
  "analysis": {
     "filter": {
        "edge_ngram_filter": {
           "type": "edge_ngram",
           "min_gram": 1,
           "max_gram": 20
        },
		"category_synonym_filter": {
		   "type": "synonym",
		   "synonyms": ["bike,bicycle", "bar,pub", "shop,store", "burger,hamburger", "bbq,barbecue", "isp,internet service provider", "exterminator,pest control service", "adult entertainment club,strip club"]
		}
     },
     "analyzer": {
        "edge_ngram_analyzer": {
           "type": "custom",
           "tokenizer": "standard",
           "filter": [
              "lowercase",
              "asciifolding",
              "edge_ngram_filter"
           ]
        },
        "search_analyzer": {
           "type": "custom",
           "tokenizer": "standard",
           "filter": [
              "lowercase",
              "asciifolding",
              "category_synonym_filter"
           ]
        }
     }
  }
},
"mappings": {
  "category": {
     "properties": {
        "category_description": {
           "type": "string",
           "analyzer": "edge_ngram_analyzer",
           "search_analyzer": "search_analyzer"
        },
      	"type" : {
        	"type" : "string",
        	"index" : "not_analyzed"
     	}
     }
  }
}
}'

Thank you for any help in advance!

RobBob · February 12, 2017, 7:13pm

Is there any more information I can provide that would help someone guide me in the right direction? Thanks!

Mark_Harwood · February 13, 2017, 10:00am

A user's search input does not have to be modelled as a single query clause.
Often it's beneficial to try several different interpretations of their input in a single search request using an array of queries in the should clause of a containing bool query. The more clauses that match a should array, the better the score.
You could try (in reverse order of importance):

An exact-match phrase query on full-words
An exact match query on full words
An partial match query using n-grams.

Currently you are are only doing 3). If you also index minus the n-grams you can do 2) as well and give an extra boost to a match on that query using the boost parameter. If someone search for irish bar then 1) would help rank matches better too.

RobBob · February 13, 2017, 11:24pm

Hello! Thank you for the reply

Okay, I believe I see what you are saying about using several should clauses instead of one must.

Right now my current query looks like this:

{size=100,
query={
    bool={
        must=[{
            match={
                category_description={
                    fuzziness=AUTO,
                    query=bar,
                    operator=and
                }
            }
        }, {
            term={type=BUSINESS}
        }]
    }
}, 
from=0}

So instead of the one must I should have several shoulds. Something like this?

{size=100,
query={
    bool={
        should=[{
            match_phrase={
                category_description=bar
            },
            term={
                category_description=bar
            },
            match={
                category_description={
                    fuzziness=AUTO,
                    query=bar,
                    operator=and
                }
            }
        }, {
            term={type=BUSINESS}
        }]
    }
}, 
from=0}

What do you mean when you say, "also index minus n-grams"?

Thank you again for the reply!

RobBob · February 14, 2017, 12:47am

So my latest query looks like this:

{size=100,
query={
    bool={
        should=[
            {match_phrase={category_description=bar}},
            {term={category_description=bar}},
            {match={category_description={fuzziness=AUTO, query=bar, operator=and}}
        }],
         must=[{term={type=BUSINESS}}]}
    },
from=0}

And it has definitely solved my problem. I am still testing the other queries to make sure they are behaving properly but it sure looks like it! If you had a minute to look over my changes to confirm I understood you that would be fantastic. And as I mentioned before, I wasn't quite sure what you mean by minus the n-grams.

Thanks again!

Mark_Harwood · February 14, 2017, 8:57am

You can index the one source field in multiple different ways e.g. With an edge ngram based analyzer and also with a standard analyzer. They end up as 2 different named fields in the search index. See fields | Elasticsearch Guide [8.11] | Elastic

RobBob · February 14, 2017, 4:40pm

Ahh okay I see. And you had mentioned I use the standard analyzed field for 1) and 2)?

Mark_Harwood · February 14, 2017, 4:45pm

That would make sense. I was illustrating a general pattern of using a range of matching methods (exact through to partial) where each can be given different levels of boost.
Ultimately it's up to you how much disk space/CPU/disk-seeks you want to throw at the matching problem with all these different approaches.

RobBob · February 14, 2017, 5:20pm

Okay, thank you for all your help. You have definitely enlightened me to the options I have though they seem more obvious now. I don't know how I missed this line of thought but I appreciate your help! Thank you

system · March 14, 2017, 5:20pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Help with Synonyms Elasticsearch	6	484	July 6, 2017
Synonym_filter and edge_ngram token filter not working together Elasticsearch	3	644	May 2, 2018
Help with synonyms and edge ngram analyzers Elasticsearch	2	1912	July 6, 2017
Synonym search in dictionary elasticsearch not result Elasticsearch	3	549	June 15, 2017
Synonym filter behavior for single word / multi words Elasticsearch	5	696	July 6, 2017

Synonym Problem

Related topics