Appropriate Cluster Setup

elnapo · June 21, 2018, 5:19pm

Hello,

I am facing the following conditions I need set up an Elasticsearch Cluster and may recalculate the hardware resources. Even if some people won't like...

conditions:
512GB RAM, 26 CPU Cores, traditional text-based logs around 3,5TB in 7 Days. Fast queries only till 3 days in the past.

The Setup I worked put so far looks like this:

6 machines each 64 GB RAM(31 GB allocated by elasticsearch) and 4 Cores CPU for Data Nodes(all master eligible).
1 machine 16GB RAM(8 GB allocated by elasticsearch). and 2 Cores for one "only Master" Node, Kibana, and Logstash
8 TB of disk space in total
one active index rolled over when reached 200GB to a new one with 6 shards on it. makes around 35GB per shard
shrinking to one shard after 3 days to get some resources back
forcemerge after 5 days to reduce disk space
deleting after 7 days

1.) is the index slicing ok?
2.) do I need more Memory cause I read some best practices saying you need as much memory as total indexed data size?
3.) do I need more CPUs on the nodes especially the master cause kibana and logstash are running there?
4.) are 8 TB of total diskspace enough?

Christian_Dahlqvist · June 22, 2018, 6:24am

Sounds reasonable. Make sure you set minimum_master_nodes to 4 if you have 6 or 7 master eligible nodes.

As your data nodes will also be master eligible, I would instead make this a coordinating only node, as it is common to have one of those next to Kibana. The amount of CPU and RAM might be OK as long as you do not run Logstash on the node.

As you look to ingest around 500GB of logs per day, I would recommend a couple of nodes dedicated to Logstash.

Assuming you are using 1 replica and have optimised your mappings at least to some extent that sounds reasonable.

Check that query performance against shards that size is acceptable as this depends a lot on your data and queries/dashboards. Given that you have a relatively short retention period I would probably use the rollover API and cut indices so shards are a bit smaller than that size, possibly around 15GB - 20GB or so.

Given the short overall retention period I do not think this is necessary.

Given the short retention period this may not be necessary.

elnapo · June 22, 2018, 8:59am

Hi Christian,

first thank you for your help. I appreciate that very much!

Sounds reasonable. Make sure you set minimum_master_nodes to 4 if you have 6 or 7 master eligible nodes.

Will be done thanks.

As your data nodes will also be master eligible, I would instead make this a coordinating only node, as it is common to have one of those next to Kibana. The amount of CPU and RAM might be OK as long as you do not run Logstash on the node.

coordinating node = client node =
node.master: false
node.data: false
node.ingest: false
search.remote.connect: false
(?????)

with logstash running on should i go with 4 cores and 32GB RAM?

As you look to ingest around 500GB of logs per day, I would recommend a couple of nodes dedicated to Logstash.

Since I can not see at the moment how much data will be passed through logstash (from everything to a fraction would be possible) I will provide a logstash instance per node. But I assume that the currently provided hardware resources sufficient for it or?

Check that query performance against shards that size is acceptable as this depends a lot on your data and queries/dashboards. Given that you have a relatively short retention period I would probably use the rollover API and cut indices so shards are a bit smaller than that size, possibly around 15GB - 20GB or so.

I will roll over at 120GB!

Christian_Dahlqvist · June 22, 2018, 9:02am

I would still recommend separate host so they do not interfere with each other. Exactly how much resources you will need will depend on what kind of processing you will do. Generally avoid hosting Logstash on the same hosts as Elasticsearch as it can make it tricky to troubleshoot any performance issues.

elnapo · June 22, 2018, 10:07am

I would still recommend separate host so they do not interfere with each other. Exactly how much resources you will need will depend on what kind of processing you will do. Generally avoid hosting Logstash on the same hosts as Elasticsearch as it can make it tricky to troubleshoot any performance issues.

Logstash will not have to do much work because 90% of the data has already been structured via serilog. Nevertheless, I would also like to run this data through a pipeline because I am flexible in terms of indexing. There are many application instances which do not have to be reconfigured if something changes in the indexing process.
Therefore, I think that a separate server that hosts only Logstash could suffice. (2 Cores 16GB RAM) ??
Is this procedure basically acceptable?
Are the hardware resources sufficient?

Thanks again for youre help!!

elnapo · June 25, 2018, 12:11pm

Hello again,

do you think decreasing the amount of cores on the data nodes has a major performance impact?
Iam thinking of using 2 Cores instead of 4 per Data Node ....

Thanks again

Christian_Dahlqvist · June 25, 2018, 12:14pm

2 cores for a 64GB node sound very little in my opinion.

system · July 23, 2018, 12:14pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch hardware requirement Elasticsearch	3	369	July 6, 2017
Elasticsearch Resource requirements Elasticsearch	4	5476	February 6, 2019
Hardware configuration - tips Elasticsearch	10	1371	July 5, 2017
Elastic stack hardware requirements Elasticsearch	4	9612	September 1, 2020
Configuration of elasticsearch in production environment Elasticsearch	2	1238	November 24, 2016

Appropriate Cluster Setup

Related topics