Why Completion Suggester with Edge Ngram analyzer takes 15 to 17 times more index size as compared to default

I am implementing completion suggester on multiple field having alphanumeric and colon(AA:890090:xyz:9090).
When I use default analyzer for completion suggester(simple analyzer), I am not able to get suggestions(AA:890) as simple analyzer tokenize alphabets only.
To overcome this issue, we used edge ngram analyzer, which solves the suggestion issues but index size is more than 15 to 17 times as compared to default analyzer.
Foreg. index size is around 3GB with default analyzer and with ngram analyzer size become 50 GB.

Why index size with ngram analyzer is that much high as compare to default analyzer, Elastic search mapping can be find below.
Please suggest, if there is any better way to do the same.

Sample data:
AA:890090:xyz:9090

Sample Mapping :

 {
  "settings": {
    "analysis": {
      "filter": {
        "ngram_filter": {
          "type": "edge_ngram",
          "min_gram": 3,
          "max_gram": 40
        }
      },
      "analyzer": {
        "ngram_analyzer": {
          "type": "custom",
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "ngram_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "field1Suggest": {
          "type": "completion",
          "analyzer": "ngram_analyzer",
          "search_analyzer": "whitespace"
        },
        "field2Suggest": {
          "type": "completion",
          "analyzer": "ngram_analyzer",
          "search_analyzer": "whitespace"
        }
      }
    }
  }
}

You have a price to pay:

  • At index time, using ngrams: that's slowing down the ingestion and take much more space
  • At search time, using wildcards: that's slowing down the search.

Something you can look at: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters.html

May be that can help but I'm not sure though.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.