PerFieldSimilarity for synonym expansion at query time

guilherme_maranhao · July 21, 2017, 12:34am

Hi everybody,

It's said here (https://www.elastic.co/guide/en/elasticsearch/guide/current/synonyms-expand-or-contract.html) that using simple synonym expansion at query time is an advantage for relevance. Please, consider my scenario:

My simple expansion synonym is:

shirt, blouse

The index structure is

  {
	"my_index": {
		"aliases": {},
		"mappings": {
			"my_type": {
				"properties": {
					"name": {
						"type": "text",
						"fields": {
							"keyword": {
								"type": "keyword",
								"ignore_above": 256
							}
						}
					}
				}
			}
		},
		"settings": {
			"index": {
				"analysis": {
					"filter": {
						"brazilian_stop": {
							"type": "stop",
							"stopwords": "_brazilian_"
						},
						"synonym_filter": {
							"type": "synonym",
							"synonyms_path": "sinonimos.txt"
						}
					},
					"analyzer": {
						"synonym_brazilian_analyzer": {
							"filter": [
								"lowercase",
								"asciifolding",
								"synonym_filter",
								"brazilian_stop"
							],
						"tokenizer": "standard"
						}
					}
				}
			}
		}
	}
}

Note that I'm not applying the synonym analyzer at index time.

And only 3 documents:

_id = 1
{
    name: "shirt xyz" 
}

_id = 2
{
   name: "blouse xyz" 
}

_id = 3
{
    name: "blouse wvc"
}

My query is

  {
	"query":
    {
    	"query_string":
        {
        	"fields":["name"], 
        	"query":"shirt", 
        	"analyzer":"synonym_brazilian_analyzer"
        }
    }
}

If I search for "shirt", applying the synonym analyzer just at query time, shouldn't the _id=1 document have higher score than the _id=2 one?

I'm asking that because, according to the explain clause, both have exactly the same score.
My point is: what exactly is the query time advantage for relevance considering simple expansion?
What about the PerFieldSimilarity calculation?
Shouldn't the "shirt xyz" text have more relevance than the "blouse xyz", at query time?

Thanks a lot,

Guilherme

jpountz · July 21, 2017, 2:01pm

I get the same score for all documents when I run the following:

DELETE index

PUT index 
{
  "settings": {
    "number_of_shards": 1, 
    "analysis": {
      "filter": {
        "my_syns": {
          "type": "synonym",
          "synonyms" : [
            "shirt,blouse"
          ]
        }
      },
      "analyzer": {
        "my_index_analyzer": {
          "type": "custom",
          "tokenizer": "whitespace"
        },
        "my_search_analyzer": {
          "type": "custom",
          "tokenizer": "whitespace",
          "filter": ["my_syns"]
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "field": {
          "type": "text",
          "analyzer": "my_index_analyzer",
          "search_analyzer": "my_search_analyzer"
        }
      }
    }
  }
}

PUT index/doc/1
{
  "field": "shirt xyz"
}

PUT index/doc/2
{
  "field": "blouse xyz"
}

GET index/_search
{
  "query": {
    "match": {
      "field": "shirt"
    }
  }
}

guilherme_maranhao · July 21, 2017, 3:33pm

But, is that the expected?

jpountz · July 21, 2017, 3:35pm

To me it is. We do this on purpose on the Lucene side by using a SynonymQuery which merges statistics in order to make sure which synonym is used does not matter.

guilherme_maranhao · July 21, 2017, 3:37pm

Ok, but what about the advantage for analyzing it at query time mentioned here Expand or contract | Elasticsearch: The Definitive Guide [2.x] | Elastic ?

The IDF for each synonym will be correct.

Thank you!

guilherme_maranhao · July 21, 2017, 8:08pm

Hi,

I got it! The IDF at query time considers the relevance of each of the synonyms in the whole index not only the specific term that is being searched.
So, the documents which contain the most relevant synonym word will have the highest scores.

Thank you

system · August 18, 2017, 8:08pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Synonyms and relevance Elasticsearch	4	855	July 5, 2017
Synonyms relevance help Elasticsearch	7	558	December 27, 2021
Synonyms as option at query time Elasticsearch	3	572	February 7, 2018
Synonym configuration Elasticsearch	2	403	July 6, 2017
Query- or index-time synonym expansion Elasticsearch	3	1902	July 6, 2017

PerFieldSimilarity for synonym expansion at query time

Related topics