Why Completion Suggester with Edge Ngram analyzer takes 15 to 17 times more index size as compared to default

sharma.puneet3 · May 28, 2018, 6:16am

I am implementing completion suggester on multiple field having alphanumeric and colon(AA:890090:xyz:9090).
When I use default analyzer for completion suggester(simple analyzer), I am not able to get suggestions(AA:890) as simple analyzer tokenize alphabets only.
To overcome this issue, we used edge ngram analyzer, which solves the suggestion issues but index size is more than 15 to 17 times as compared to default analyzer.
Foreg. index size is around 3GB with default analyzer and with ngram analyzer size become 50 GB.

Why index size with ngram analyzer is that much high as compare to default analyzer, Elastic search mapping can be find below.
Please suggest, if there is any better way to do the same.

Sample data:
AA:890090:xyz:9090

Sample Mapping :

 {
  "settings": {
    "analysis": {
      "filter": {
        "ngram_filter": {
          "type": "edge_ngram",
          "min_gram": 3,
          "max_gram": 40
        }
      },
      "analyzer": {
        "ngram_analyzer": {
          "type": "custom",
          "tokenizer": "whitespace",
          "filter": [
            "lowercase",
            "ngram_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "doc": {
      "properties": {
        "field1Suggest": {
          "type": "completion",
          "analyzer": "ngram_analyzer",
          "search_analyzer": "whitespace"
        },
        "field2Suggest": {
          "type": "completion",
          "analyzer": "ngram_analyzer",
          "search_analyzer": "whitespace"
        }
      }
    }
  }
}

dadoonet · May 29, 2018, 7:02am

You have a price to pay:

At index time, using ngrams: that's slowing down the ingestion and take much more space
At search time, using wildcards: that's slowing down the search.

Something you can look at: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters.html

May be that can help but I'm not sure though.

system · June 26, 2018, 7:02am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch index size using different analyzers Elasticsearch	16	1824	February 1, 2017
NGram Index implications Elasticsearch	5	841	July 5, 2017
Possible bug in completion context suggester (6.8 and 7.10) Elasticsearch	1	294	July 9, 2021
Slow attachment autocompletion with edge-ngrams and highlighting Elasticsearch	5	556	July 5, 2017
Completion analyzer Elasticsearch	2	770	December 29, 2017

Why Completion Suggester with Edge Ngram analyzer takes 15 to 17 times more index size as compared to default

Related topics