Context Suggester: FST size estimate (RAM)

Bhavik92 · October 20, 2020, 6:57am

I've heard about the high memory (heap) occupancy in case FST.
Is there any thumb rule I can use to calculate the size of FST / heap given the amount of data that'll be fed as input to suggester. Any benchmarking studies focussing on this is also welcome.

This will help us decide between completion suggester and other less memory intensive approaches.

We're trying to evaluate completion suggester for our use-case. The fields to autocomplete are short, consisting of 4 words maximum
We're tokenizing them ourselves at indexing time and indexing the array as input so as to support search on any token (not just the prefix)

spinscale · October 21, 2020, 8:22am

It depends on how you are using the completion suggester, but there is an easy way to monitor its size by using index stats.

Bhavik92 · October 21, 2020, 6:41pm

Hey @spinscale .. Thanks for reply !

We're just enabling completion on one field which can consist of maximum 4 words and we also limit max_input_length to 50 chars.

I checked the node stats as well as index stats for "completion" field.
The node stats clearly mention that "size in bytes" is memory usage (not disk)..
However, for index stats, I'm not sure whether "size in bytes" means RAM / disk.
It was 96 MB for a single shard. (192 MB for index with 2 shards)

We had 10 indices on which I had indexed a completion field. These 10 are just clones of each other. I only use 1 of them for my queries.
However, each of them is occupying the same amount in memory.. Does this mean that there is no optimization based on frequency of usage ?
All indices FSTs are always kept in memory ?
And I also see that this number cleanly doubles because of 2 shards per index too.

I'd assume that if it is memory, it's probably all in heap although heap stats don't have breakup of completion related fields.

spinscale · October 22, 2020, 9:46am

Indeed, FST are kept in memory in the heap (loaded from disk). There is no optimization based on usage, this data structure exists as is, independent from your usage.

Bhavik92 · October 26, 2020, 7:14am

Thanks for clarification @spinscale
So I read through all the metrics again and my understanding is that metrics with just "size_in_bytes" are actually disk size metrics while metrics with "memory" in name like "memory_in_bytes", "terms_memory_in_bytes" are RAM metrics.
Going by that logic, since "completion" field has just "size_in_bytes", it should be the disk metric right ? (Or I am wrong)
If it is disk size metric, I did not find any metric with "memory" in name that could correspond to just "completion". I can see "terms_memory_in_bytes",
"stored_fields_memory_in_bytes",
"term_vectors_memory_in_bytes",
"norms_memory_in_bytes" but they don't seem to be specific to completion. Could u lead me to the memory metric for that.

spinscale · October 26, 2020, 9:03am

Hey,

so when digging through the code, this is loading the FST data structure in heap and then using that for the size calculation. So this is indeed memory based.

--Alex

system · November 23, 2020, 9:03am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Context Suggester: measures to limit memory usage Elasticsearch	1	692	December 3, 2020
Overhead of Completion Suggester Elasticsearch	2	307	July 6, 2017
[7.0.1] Heap pressure - What are FST's used for? Elasticsearch	3	2060	November 1, 2019
FST's in completion types Elasticsearch	1	382	July 6, 2017
Memory usage of completion/context suggester Elasticsearch	1	277	August 5, 2022

Context Suggester: FST size estimate (RAM)

Related topics