Hi, I have some quetions regarding the memory usage and storage of documents in Elasticsearch.
Lets say I have JSON doc with the size of X kb. If I ship it to Elasticsearch with Logstash and analyze it as a JSON, In what size will it be when its inside Elasticsearch? How much data overhead will be added?
I understand that it depends on multiple configurations, but let say I keep the defualt settings, what disk usage could I expect if I want to store millions of documents?
Plus, what is the actual size of elasticsearch files themselves?
Is there a difference between data nodes and master nodes regarding file size and doc storage size?
These questions are asked in order to understand what will be the system requirments, memorywise, when I want to install a new Elastic stack.
Hi, I tried that but I can't tell which part of Elastic gives what overhead component so that the file has this size after ingesting it into elasticsearch.
Plus, when I do du -h in my Centos machine I don't see the same number that I see in the _stats.
There used to be a thumb rule saying index size is around 10% of your data, but it's almost impossible to estimate today given doc_values, various analyzers and whether or not you use best_compression.
I usually go with a 1:1 estimation in sizing procedures to keep on the safe side. It will usually be smaller than this but it's a good place to start.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.