Generating shingles with synonyms in Elasticsearch

Yawan_Gupta · January 20, 2022, 2:58pm

I have a file of alternate spellings for the terms in my index. I want to produce bigrams containing those alternate spellings for particular terms. For example, I have biriyani, biryani, briyani in my alternate spellings csv file and my field contains the text Chicken Biryani . I want to be able to produce chicken biryani, chicken biriyani, chicken briyani tokens.

Now, if I use a whitespace tokenizer with a synonym filter, the following tokens are generated chicken, biriyani, biryani, briyani which is expected. Now if I apply a shingle filter then, the tokens generated are chicken, chicken biryani, biryani, biryani biriyani, biriyani, biriyani briyani, briyani . This token stream contains shingles of synonyms of the word itself which should not be there and it does not contain tokens with chicken [alternate spellings of biryani] like chicken biriyani or chicken briyani, etc. If I place shingle filter before the synonym filter, then it only adds synonym tokens for the unigram: chicken, chicken biryani, biriyani, biryani, briyani . Is there a way to generate tokens that contain synonyms at the same position as the original token, or in this case chicken biryani, chicken biriyani, chicken briyani

I am running Elasticsearch 5.6. Sample settings for testing:

PUT test_bigram
{
  "settings": {
    "index": {
      "analysis": {
        "filter": {
          "synonym": {
            "type": "synonym",
            "synonyms": [
              "biriyani, biryani, briyani"
            ]
          }
        },
        "analyzer": {
          "synonym_analyzer": {
            "filter": [
              "synonym"
            ],
            "type": "custom",
            "tokenizer": "whitespace"
          },
          "shingle_synonym": {
            "type": "custom",
            "tokenizer": "whitespace",
            "filter": [
              "shingle",
              "synonym"
            ]
          },
          "synonym_shingle": {
            "type": "custom",
            "tokenizer": "whitespace",
            "filter": [
              "synonym",
              "shingle"
            ]
          }
        }
      }
    }
  }
}

system · February 17, 2022, 2:58pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to generate shingles before synonyms filter? Elasticsearch	3	341	September 14, 2022
Synonym token graphs and Shingles don't play well together Elasticsearch	2	618	June 4, 2020
Shingle filter to allow mismatching spaces Elasticsearch	5	1431	November 30, 2020
Fuzzy searching on shingles filter getting problem for search Elasticsearch	1	408	November 9, 2018
Fuzzy searching on shingles filter getting problem Elasticsearch	1	634	November 6, 2018

Generating shingles with synonyms in Elasticsearch

Related topics