Highlighting with edge ngram token + synonym filter

Hi,
For my use case I need to use both an edge-ngram token filter and a synonym filter, and then highlight the appropriate token in the result using highlight.
Considering I've to use both edge-ngram and synonym, I've to use edge-ngram token filter as against the edge ngram tokenizer; and apply them in the order of synonym filter -> edge ngram token filter.

However this creates a issue in the highlighting (which as per me, comes because of no position increments).

Please look at the below image to see the highlight created when I search for "index". Even "industrial" (as a whole gets highlighted)

    PUT test1/
    {
      "settings": {
        "analysis": {
          "filter": {
            "my_edge_ngram_filter": {
              "token_chars": [
                "letter",
                "digit"
              ],
              "min_gram": "1",
              "type": "edgeNGram",
              "max_gram": "12"
            },
            "synonym_normal": {
              "type": "synonym",
              "synonyms": [
                "index, bond",
                "industrial, industry"
              ]
            }
          },
          "tokenizer": {
            "my_edge_ngram_tokenizer": {
              "token_chars": [
                "letter",
                "digit"
              ],
              "min_gram": "1",
              "type": "edgeNGram",
              "max_gram": "12"
            }
          },
          "analyzer": {
            "synonym_edgengram": {
              "filter": [
                "synonym_normal",
                "my_edge_ngram_filter"
              ],
              "tokenizer": "whitespace",
              "type" : "custom"
            },
            "edgengram_tokenizer": {
              "tokenizer": "my_edge_ngram_tokenizer",
              "type" : "custom"
            }
          }
        }
      },
      "mappings": {
        "test1": {
          "properties": {
            "name": {
              "type": "text",
              "fields": {
                "field_synonym_edgengram": {
                  "type": "text",
                  "analyzer": "synonym_edgengram",
                  "fielddata": true
                },
                "field_edgengram_tokenizer": {
                  "type": "text",
                  "analyzer": "edgengram_tokenizer",
                  "fielddata": true
                }
              }
            }
          }
        }
      }
    }

Indexing one document:

POST test1/test1/
{
 "name" :"index industrial"
}

Query1 with highlight:

    GET test1/_search
    {
      "query": {
        "match": {
          "name.field_synonym_edgengram": "index "
        }
      },
      "highlight": {
        "fields": {
          "name.field_synonym_edgengram": {}
        }
      }
    }

Result:
image

Where as you can see; even "industrial" as a whole instead of just "ind" gets highlighted

Now, if I run the same query using edge-ngram tokenizer (and without syonyms):

Query2:

    GET test1/_search
    {
      "query": {
        "match": {
          "name.field_edgengram_tokenizer": "index "
        }
      },
      "highlight": {
        "fields": {
          "name.field_edgengram_tokenizer": {}
        }
      }
    }

It gets properly highlighted:
image

I believe this is due to the position increments seen in edge-ngram tokenizer as against filter. Any way to get around this highlighting issue?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.