Memory usage of completion/context suggester

Hello,

I have a question regarding the context suggester implementation, on Elasticsearch 8.3.1.

I have this mapping:

{
	"mappings": {
		"properties": {
			"suggest": {
				"type": "completion",
				"contexts": [
					{
						"name": "scenario",
						"type": "category"
					}
				]
			}
		}
	}
}

And I indexed the same suggestion inputs several times, but using different weights per scenario. Example:

PUT http://localhost:9200/entities/_doc/1:UK
{
  "suggest": [
    {
      "input": "Big Ben",
      "weight": 10,
      "contexts": {
        "scenario": ["UK"]
      }
    }
  ]
}

PUT http://localhost:9200/entities/_doc/1:US
{
  "suggest": [
    {
      "input": "Big Ben",
      "weight": 2,
      "contexts": {
        "scenario": ["US"]
      }
    }
  ]
}

And I noticed that the memory usage from the completion FST (http://localhost:9200/entities/_stats/completion) is increasing linearly with the number of scenarios. For example, I have 3 GB for a use case with 14M entities (wikidata) with 1 scenario, and 6 GB with 2 scenarios.

I inserted entries from both scenarios in sequence, so I think they are in the same segment.

And if this is using only one FST, why is it not capable of sharing the prefixes between the entries of different scenarios (with the exact same input), to keep the memory from scaling linearly?

Thank you!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.