How much overhead stuff is stored with each document?

yishain11 · March 18, 2018, 11:01am

Hi, I have some quetions regarding the memory usage and storage of documents in Elasticsearch.
Lets say I have JSON doc with the size of X kb. If I ship it to Elasticsearch with Logstash and analyze it as a JSON, In what size will it be when its inside Elasticsearch? How much data overhead will be added?
I understand that it depends on multiple configurations, but let say I keep the defualt settings, what disk usage could I expect if I want to store millions of documents?

Plus, what is the actual size of elasticsearch files themselves?

Is there a difference between data nodes and master nodes regarding file size and doc storage size?

These questions are asked in order to understand what will be the system requirments, memorywise, when I want to install a new Elastic stack.

dadoonet · March 18, 2018, 12:42pm

Try it a part of your data and you will get an idea.
It should not take that long and it will be much better than guessing.

yishain11 · March 18, 2018, 12:47pm

Hi, I tried that but I can't tell which part of Elastic gives what overhead component so that the file has this size after ingesting it into elasticsearch.
Plus, when I do du -h in my Centos machine I don't see the same number that I see in the _stats.

dadoonet · March 18, 2018, 1:48pm

How many documents did you try? 1 million?

Itamar_Syn_Hershko · March 18, 2018, 2:53pm

There used to be a thumb rule saying index size is around 10% of your data, but it's almost impossible to estimate today given doc_values, various analyzers and whether or not you use best_compression.

I usually go with a 1:1 estimation in sizing procedures to keep on the safe side. It will usually be smaller than this but it's a good place to start.

Master nodes have no requirements of disk size.

HTH,

--

Itamar Syn-Hershko
Elasticsearch Partner
Founder, CTO BigData Boutique
http://bigdataboutique.com
http://code972.com | @synhershko

system · April 15, 2018, 2:53pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Storage on elastic search Elasticsearch	14	699	January 23, 2023
Elasticsearch and workload Elasticsearch	1	465	July 3, 2019
ElasticSearch index size peculiarity Elasticsearch	2	661	July 6, 2017
Memory/cpu ratio to disk size Elasticsearch	7	5491	October 24, 2019
How does document size affect ElasticSearch performance? Elasticsearch	5	1475	February 8, 2021

How much overhead stuff is stored with each document?

Related topics