Shard Sizing question

iamrags · August 2, 2019, 3:08pm

Hello Group,
I am currently sizing my production cluster and had some questions. PLease help me out with some pointers. Based on my research shard size should not exceed more than 50 GB to perform optimally. Below is my scenario

I have 5 nodes with SSD 100 GB each node and 16GB RAM each node
We will have about 450GB of logs to be processed each month
We dont want to store these documents for long as the raw files will available in some cold storage if we ever need them and can take it through adhoc indexing if needed
Based on these criteria, I am thinking of below
Create an index with 10 shards and 1 replica -- this will mean i need to have minimum 1TB storage (450 * 2) correct ?
Allow 8gb RAM for heap on each node making the total heap size to be 40GB -- is this good enough or will create problem for GC ?
Create ILM policy to rollover after 1month and delete the old index

Please let me know if these are good enough to begin with or are there other things that I need to consider ?
thanks
rags

Christian_Dahlqvist · August 3, 2019, 5:49am

Given that you have relatively little disk per node and a reasonably short retention period I would recommend using daily indices as that allows you to keep a rolling X number of days that fits the amount of storage you have. As long as you are using a single daily index you can probably set up 5 primary shards to get an even write distribution even though 1 would likely be sufficient.

If you have a replica data will be stored twice for high availability, so you may need more disk space and/or shorter retention period. Make sure you follow these guidelines.

I do not understand this. Can you please explain?

That is the recommended best practice assuming nothing else is running on these nodes.

As mentioned, use daily indices instead as that gives you better flexibility. Note that data is removed by deleting complete indices, so if you used monthly indices you would remove a months worth of data at a time.

iamrags · August 5, 2019, 2:24pm

Thanks Christian for the info. What I mean by 450*2 was that if I use one replica and 10 primary shards then thats a total of 20 shards correct. I use 10 shards to keep the shard size around 40gb as my total is 450gb per month and adding a replica for each primary means that i nead double of 450GB which is close to 1TB!

Also the reason I initially planned for 1 large index was to keep number of indexes lower as I read somewhere on the blog that large number of indexes (and associated shards) can have an impact on performance during query, merge and other behind the scene processes related to lucene index. Let me know if I shouldnt worry too much about index numbers as I have short retention timeframe ?
Also to provide more info about these logs they are harvested from IoT devices and have specific mapping inside index.

system · September 2, 2019, 2:25pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch node Sizing for production Elasticsearch	5	4272	July 16, 2019
Suggest for Heap size chnage and number of shards require Elasticsearch	6	3839	August 30, 2017
[Help!] Number of indexes and shards per node Elasticsearch	9	3528	May 5, 2017
Relationship between nodes count, shard count and shard size Elasticsearch	7	1704	July 5, 2017
No. of Shards Per index in ES Cluster Elasticsearch	4	1289	July 5, 2017

Shard Sizing question

Related topics