I'm a little confused with the new sizing documentation that was written for the 8.3 release. It mentions leaving 1kb Heap available per field in an index. Does this mean 1kb heap should be available per field/per document/per index. For instance, an index is 1GB with 4,821,000 documents, each containing roughly 45 fields. (It's a custom log source with a KV processor in the ingest pipeline so it fluctuates a bit). This means there are mapped 216,945,000 fields within this index. 4,821,000 * 45 = 216,945,000. If we divide this by 1000 twice (216,945,000 kb / 1000 = 216945 MB | 216945 MB / 1000 = 216.945 GB), we get 216.945 GB of heap memory, which seems excessive. Is there something I am misunderstanding during this process? Has the documentation failed me (not for the first time)? Any assistance is greatly appreciated!
The exact resource usage of each mapped field depends on its type, but a rule of thumb is to allow for approximately 1kB of heap overhead per mapped field per index held by each data node
Per mapped field in the mapping ... Not the total number of fields across all documents stored in the index.
So if an index has 250 mapped fields that would be 250K of heap.
The sample calculation shows this
For example, if a data node holds shards from 1000 indices, each containing 4000 mapped fields, then you should allow approximately 1000 × 4000 × 1kB = 4GB of heap for the fields and another 0.5GB of heap for its workload and other overheads, and therefore this node will need a heap size of at least 4.5GB.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.