Match query doesn't work well with Genre Expansion Synonyms

Elasticsearch 7.2, node.js

Synonym data:

[ "khaki => khaki,green", "cat => cat,pet"]

Index mapping:

PutMapping content:

{
    settings: {
        "analysis": {
            "char_filter": {
                "same_word": {
                    "type": "mapping",
                    "mappings": ["-=>", "&=>and"]
                },
            },
            "filter": {
                "my_stopwords": {
                    "type": "stop",
                    "stopwords": STOPWORD_FILE
                },
                "my_synonym": {
                    "type": "synonym",
                    "synonyms": [ "khaki => khaki,green", "cat => cat,pet"],
                    "tokenizer": "whitespace"
                },
            },
            "analyzer": {
                "lowercaseWhiteSpaceAnalyzer": {
                    "type": "custom",
                    "char_filter": ["html_strip", "same_word"],
                    "tokenizer": "standard",
                    "filter": [
                        "lowercase",
                        "my_stopwords",
                        "my_synonym",
                    ]
                },
            }
        }
    }
}

Material field:

"phone_case":{"type":"text","norms":false,"analyzer":"lowercaseWhiteSpaceAnalyzer"}

Example documents:

 [
  {
      id: "1",
      phone_case: "khaki,brushed and polished",
  },
  {
      id: "2",
      phone_case: "green,brushed",
  },
  {
      id: "3",
      phone_case: "black,matte"
  }
]

The "phone_case" field is a text field.

When I search for khaki I want to find documents with just khaki results, excluding any results that contain green. On the other hand, when searching for green, I want to get documents with either green or khaki. That should be what Genre Expansion is supposed to do.

The term level query works fine for such purposes:

{
  "sort": [
    {
      "updated": {
        "order": "desc"
      }
    }
  ],
  "size": 10,
  "from": 0,
  "query": {
    "bool": {
      "filter": {
        "term": {
          "phone_case": "khaki"
        }
      }
    }
  }

It manages to return just documents with khaki.

But with match_phrase, it returns documents with either khaki or green. That isn't what I expected. I want to get documents that contains khaki, not green:

{
  "sort": [
    {
      "updated": {
        "order": "desc"
      }
    }
  ],
  "size": 10,
  "from": 0,
  "query": {
    "match_phrase": {
      "phone_case": "khaki"
    }
  }
}

Could anyone tell me what's wrong with the match query not able to exclude results that contain "green"? I want to allow uses to look up the text field in exact order but match or match_phrase doesn't work well with Genre Expansion Synonyms.

You have configured an analyzer for the phone_case field, but no explicit search_analyzer. As a result, Elasticsearch will apply the same analyzer to your query terms - as long as you are not using term queries.

So when you search for khaki with a match or match_phrase query you are actually going to search for both khaki as well as green because that's what your khaki search term is going to be expanded to. This doesn't happen when you use the term query, as it does not analyze the search term.

How to solve this? Add a search_analyzer to the mapping for phone_case:

      "phone_case": {
        "type": "text",
        "norms": false,
        "analyzer": "lowercaseWhiteSpaceAnalyzer",
        "search_analyzer": "standard"
      }

Now, your search terms will not get any synonyms applied and genre expansion should work as expected.

(Instead of the standard analyzer like in my example above you may want to use a custom analyzer similar to lowercaseWhiteSpaceAnalyzer, but without the synonym filter)

2 Likes

Thank you for the explanation. I have to use this mapping for phone_case instead

phone_case: {
    type: "text",
    norms: false,
    index_analyzer: "standard",
    search_analyzer: "lowercaseWhiteSpaceAnalyzer"
}

And change the synonym format from

[ "khaki => khaki,green", "cat => cat,pet"]

to

[ "green => khaki,green", "pet => cat,pet"]

Now the genre expansion is working properly in either "term" and "match_phrase" queries.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.