How much overhead stuff is stored with each document?

Hi, I have some quetions regarding the memory usage and storage of documents in Elasticsearch.
Lets say I have JSON doc with the size of X kb. If I ship it to Elasticsearch with Logstash and analyze it as a JSON, In what size will it be when its inside Elasticsearch? How much data overhead will be added?
I understand that it depends on multiple configurations, but let say I keep the defualt settings, what disk usage could I expect if I want to store millions of documents?

Plus, what is the actual size of elasticsearch files themselves?

Is there a difference between data nodes and master nodes regarding file size and doc storage size?

These questions are asked in order to understand what will be the system requirments, memorywise, when I want to install a new Elastic stack.

Try it a part of your data and you will get an idea.
It should not take that long and it will be much better than guessing.

Hi, I tried that but I can't tell which part of Elastic gives what overhead component so that the file has this size after ingesting it into elasticsearch.
Plus, when I do du -h in my Centos machine I don't see the same number that I see in the _stats.

How many documents did you try? 1 million?

There used to be a thumb rule saying index size is around 10% of your data, but it's almost impossible to estimate today given doc_values, various analyzers and whether or not you use best_compression.

I usually go with a 1:1 estimation in sizing procedures to keep on the safe side. It will usually be smaller than this but it's a good place to start.

Master nodes have no requirements of disk size.

HTH,

--

Itamar Syn-Hershko
Elasticsearch Partner
Founder, CTO BigData Boutique
http://bigdataboutique.com
http://code972.com | @synhershko

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.