Hello,
I have a question regarding the context suggester implementation, on Elasticsearch 8.3.1.
I have this mapping:
{
"mappings": {
"properties": {
"suggest": {
"type": "completion",
"contexts": [
{
"name": "scenario",
"type": "category"
}
]
}
}
}
}
And I indexed the same suggestion inputs several times, but using different weights per scenario
. Example:
PUT http://localhost:9200/entities/_doc/1:UK
{
"suggest": [
{
"input": "Big Ben",
"weight": 10,
"contexts": {
"scenario": ["UK"]
}
}
]
}
PUT http://localhost:9200/entities/_doc/1:US
{
"suggest": [
{
"input": "Big Ben",
"weight": 2,
"contexts": {
"scenario": ["US"]
}
}
]
}
And I noticed that the memory usage from the completion FST (http://localhost:9200/entities/_stats/completion
) is increasing linearly with the number of scenarios. For example, I have 3 GB for a use case with 14M entities (wikidata) with 1 scenario, and 6 GB with 2 scenarios.
I inserted entries from both scenarios in sequence, so I think they are in the same segment.
And if this is using only one FST, why is it not capable of sharing the prefixes between the entries of different scenarios (with the exact same input), to keep the memory from scaling linearly?
Thank you!