How to generate shingles before synonyms filter?

Hi,

I understand we can't have token filters that produce multiple tokens before a synonyms filter. (Ensure TokenFilters only produce single tokens when parsing synonyms by romseygeek · Pull Request #34331 · elastic/elasticsearch · GitHub).

I was trying to use a shingles filter to generate combination of words before using the synonyms filter. But thats not working. Can you suggest an alternate solution for the below scenario?

(I already have a phrase list filter which takes a predefined list of phrases from a file but that doesn't cover all the phrases. Wanted a more dynamic solution)

Synonyms

"heart attack", "cardiac arrest"

Input text

"Symptoms of heart attack"

"heart", "attack", "cardiac" and "arrest" should not get mapped their respective synonyms individually, instead only the combination of "heart attack" should get mapped to "cardiac arrest
"

Pls let me know the best way to achieve this.

Hi @mattkallo.

Look this example, maybe help you to find the solution

PUT /test
{
  "settings": {
    "analysis": {
      "analyzer": {
        "synonyms_analyzer": {
          "tokenizer": "standard",
          "filter": [
            "synonyms_filter",
            "shingle_filter"
          ]
        }
      },
      "filter": {
        "synonyms_filter": {
          "type": "synonym",
          "synonyms": [
            "heart attack => cardiac arrest"
          ]
        },
        "shingle_filter": {
          "type": "shingle",
          "min_shingle_size": 2,
          "max_shingle_size": 3,
          "output_unigrams": false
        }
      }
    }
  }
}

GET test/_analyze
{
  "analyzer": "synonyms_analyzer",
  "text": ["heart attack"]
}

Tokens:

{
  "tokens" : [
    {
      "token" : "cardiac arrest",
      "start_offset" : 0,
      "end_offset" : 12,
      "type" : "shingle",
      "position" : 0
    }
  ]
}

Thanks @RabBit_BR for the response. I tried this approach, won't this create invalid (contextually) combinations? For eg. In the above case, if "heart" is mapped to its synonyms and "attack" mapped to its own set of synonyms, and using synonyms filter before shingle will output contextually invalid combinations. Am I wrong in this observation?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.