ElasticSearch Size Recommendation

Feedy · March 31, 2017, 5:29am

TLDR: what specs are appropriate for client, data, and master nodes when ingesting 250GB/day?

Hi All,
I've been tasked with building out an ELK Stack as my company would like to move away from Splunk. We've already began using the ELK Stack template that AWS provides but we would like more control over configuration. With that being said, I've been reading a lot of documents and I think I have a good idea of the specs for each node but still wanted to reach out to the community in case someone had a more definitive answer. This clustered ELK environment would be ingesting around 250GB/day and possibly growing in the near future. This is what I was thinking:
10 total nodes
2 client nodes
3 data nodes
3 master nodes
1 Kibana
2 Logstash

From my reading I have leaned that the master node doesn't require that much in RAM and HD so I figure maybe 8GB in RAM and 50 in HD?
From my reading I have learned that the data nodes work best at 64GB but still works well at 32GB. I was thinking maybe 2TB for each data node. At least 1yr retention on indexes.
I am not sure at all what the specs should be for the client nodes?

Christian_Dahlqvist · March 31, 2017, 5:50am

As each shard holds a finite amount of data and is associated with some overhead in terms of memory, file handles and CPU, the more hap you have on a node the more data it can hold. As we generally recommend heap to be < 32GB and 50% of available RAM to be used for heap, 64GB RAM per node is often considered the sweet spot.

As dedicated master nodes are not serving traffic and just manage the cluster, they generally just need a few CPU cores and 4-8GB RAM. As they do not hold data, heap can be set to 75% of the available host memory. Client nodes may be useful, but are generally not necessary for a lot of logging use cases.

If you have 250GB per day and want to keep that for 1 year, that corresponds to around 90TB of raw data. Based on that I would expect you to need more disk space on the data nodes as well as a larger number of data nodes. Exactly how much space that amount of data will take up on disk once indexed will largely depend on how you optimise your mappings. Although it is getting a bit old, this blog post illustrates the effect different mappings can have.

Feedy · March 31, 2017, 2:42pm

Hi Christian_Dahlqvist,

Thanks you for that information. What's the difference between a client node and a coordinating node? I believe they are the same correct?

Christian_Dahlqvist · March 31, 2017, 2:50pm

Yes, they are the same.

Feedy · March 31, 2017, 2:52pm

Also, what about an ingest node? Sorry I found a new document(there are so many ;-).....Where does the ingest node come into play? Is it the same as Logstash?

Christian_Dahlqvist · March 31, 2017, 2:55pm

Ingest node is a new node type in Elasticsearch 5.x which allows you to transform indexing requests prior to writing them to Elasticsearch. It supports a subset of the functionality available in Logstash and can allow for a simpler architecture in some cases. If you decide to use them, you should probably use dedicated ingest nodes as they like Logstash can be CPU intensive.

Feedy · March 31, 2017, 5:56pm

ook got it. Thanks again. I will take this information you provided and begin building!!

system · April 28, 2017, 5:56pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Disk Size and RAM size for each data node and master node Elasticsearch	5	3065	August 17, 2018
Any guideline with Elasticsearch nodes and AWS instance types? Elasticsearch	7	14888	April 18, 2017
Sizing ELK cluster for 12GB daily logs Elasticsearch	3	841	December 25, 2018
Elasticsearch Server requirements Elasticsearch	5	658	December 27, 2019
What Hardware do I need for 100 GB Per day data volume? Elasticsearch	4	2462	August 2, 2018

ElasticSearch Size Recommendation

Related topics