Synonyms: Explicit mapping behaving like Equivalent synonyms


(David Baughman) #1

For context, I'm using AWS Elasticsearch service, v6.2

I am trying to define a set of simple contraction synonyms for an index, but they are behaving like equivalent synonyms. For example, we have a multivitamin product called "Peak Performance", so I want to make it so when users search for "vitamin" that query will match "peak performance" using the following explicit mapping (AKA simple contraction) synonym:

"vitamin, supplement => peak performance"

But what's happening is that I search for "peak performance" and also get matches for "vitamin", which is what I would have expected for an equivalent synonym, but not what I'm trying to do. From the ES documentation, it says:

Explicit mappings match any token sequence on the LHS of "=>" and replace with all alternatives on the RHS. These types of mappings ignore the expand parameter in the schema.

I thought that maybe my simple contraction was getting expanded, so I added "expand": "false" to my custom analyzer, but that didn't make any difference. I'm including the contents of my index configuration, as well as an example query with response below. Please let me know what I'm doing wrong.

Analyzer Settings

{
  "filter": {
    "english_keywords": {
      "keywords": [
        "example"
      ],
      "type": "keyword_marker"
    },
    "english_stemmer": {
      "type": "stemmer",
      "language": "english"
    },
    "synonyms_en": {
      "type": "synonym",
      "expand": "false",
      "synonyms": [
        "vitamin, supplement => peak performance",
        "soap => wash",
        "protein => access",
        "kids => koala"
      ]
    },
    "english_possessive_stemmer": {
      "type": "stemmer",
      "language": "possessive_english"
    },
    "english_stop": {
      "type": "stop",
      "stopwords": "_english_"
    }
  },
  "analyzer": {
    "custom_en": {
      "filter": [
        "english_possessive_stemmer",
        "lowercase",
        "synonyms_en",
        "english_stop",
        "english_keywords",
        "english_stemmer"
      ],
      "tokenizer": "standard"
    }
  }
}

POST Query Body

Only searching/returning the name.regular field to try to isolate the issue.

{
  "from": 0,
  "size": 10,
  "_source": [
    "name.regular"
  ],
  "query": {
    "bool": {
      "must": {
        "multi_match": {
          "query": "peak performance",
          "fuzziness": "auto",
          "operator": "and",
          "type": "most_fields",
          "fields": [
            "name.regular.en"
          ]
        }
      }
    }
  },
  "post_filter": {
    "term": {"channel_id": 1}
  }
}

Query Response

I would expect only items with "peak performance" to be returned, but I'm getting results with "vitamin" or "supplement" as well, which is the behavior of an equivalent synonym, not an explicit mapping like I wrote them.

{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 28,
    "max_score": 9.393171,
    "hits": [
      {
        "_index": "en_us_primary",
        "_type": "_doc",
        "_id": "product_2799_1",
        "_score": 9.393171,
        "_source": {
          "name": {
            "regular": "Vitality Vitamin D3"
          }
        }
      },
      {
        "_index": "en_us_primary",
        "_type": "_doc",
        "_id": "product_3068_1",
        "_score": 8.375387,
        "_source": {
          "name": {
            "regular": "Calmicid Antacid Supplement"
          }
        }
      },
      {
        "_index": "en_us_primary",
        "_type": "_doc",
        "_id": "product_861_1",
        "_score": 8.023193,
        "_source": {
          "name": {
            "regular": "Peak Performance Men Save $47.95"
          }
        }
      },
      {
        "_index": "en_us_primary",
        "_type": "_doc",
        "_id": "product_862_1",
        "_score": 7.4778767,
        "_source": {
          "name": {
            "regular": "Peak Performance Longevity 50+ Save $47.95"
          }
        }
      },
      {
        "_index": "en_us_primary",
        "_type": "_doc",
        "_id": "product_980_1",
        "_score": 7.229685,
        "_source": {
          "name": {
            "regular": "Peak Performance Brain Women Save $90.92"
          }
        }
      },
      {
        "_index": "en_us_primary",
        "_type": "_doc",
        "_id": "product_7798_1",
        "_score": 7.229685,
        "_source": {
          "name": {
            "regular": "Peak Performance Heart Men Save $93.92"
          }
        }
      },
      {
        "_index": "en_us_primary",
        "_type": "_doc",
        "_id": "product_2708_1",
        "_score": 6.8760505,
        "_source": {
          "name": {
            "regular": "Sei Bella Fortifying Vitamin Lotion"
          }
        }
      },
      {
        "_index": "en_us_primary",
        "_type": "_doc",
        "_id": "product_967_1",
        "_score": 6.7668524,
        "_source": {
          "name": {
            "regular": "Peak Performance Metabolic Pack Men Save $78.93"
          }
        }
      },
      {
        "_index": "en_us_primary",
        "_type": "_doc",
        "_id": "product_860_1",
        "_score": 6.45929,
        "_source": {
          "name": {
            "regular": "Peak Performance Women Save $47.95"
          }
        }
      },
      {
        "_index": "en_us_primary",
        "_type": "_doc",
        "_id": "product_592_1",
        "_score": 6.414261,
        "_source": {
          "name": {
            "regular": "Peak Performance Total Men Save $131.89"
          }
        }
      }
    ]
  }
}

#2

I think it's the other way around!
Setting this "vitamin, supplement => peak performance" means :
Whenever a document has "vitamin", replace the token by "peak performance"
So when searching "peak performance", you will get this document with the vitamin or supplement token.

The other way :
"peak performance => peak performance, vitamin, supplement"
So during the search :

  • for vitamin: returns documents with "vitamin" and also "peak performance" documents
  • for peak performance: returns documents with "peak performance" only

(David Baughman) #3

@klof thanks for the reply. I tried your suggestion and the result is very similar, though the relevance of the results is a little different. I still get results with "vitamin" or "supplement" when searching for "peak performance" (still searching the name.regular field only, as before).

According to the ES documentation, the "other" way you are describing is called "Genre Expansion":

...genre expansion widens the meaning of a term to be more generic.

What I'm trying to accomplish is referred to in the docs as a "simple contraction", or "explicit mapping", which maps one or more terms on the left to one or more terms on the right.

From everything I've read, I believe I'm formatting the synonyms correctly, but I must be missing something else that prevents a simple contraction from working normally.


#4

Have you tried to debug this, to understand the issue?
Add explain in your query to understand how the score is calculated.
And mainly the _analyze API to understand how the documents are indexed in Elastic.

For example what is the result of :

GET index/_analyze
{
  "field" : "name.regular.en",
  "text" : "Peak Performance Women Save $47.95"
}

And :

GET index/_analyze
{
  "field" : "name.regular.en",
  "text" : "Vitality Vitamin D3"
}

With the _analyze API, you will see in the response, if Elastic add/replace or not the words by the synonyms.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.