Hardware requirements for production cluster

Hi there,

This has likely been discussed many times so apologies in advance for any veteran Elastic members! As someone reasonably new to the Stack (despite being a forum member for a while now). I'd like to get some advice before committing to he purchases of new hardware. We currently have a small "proof of concept" setup which involves:

1x "Hot" data node
- 10 Core Intel Gold CPU
- 64GB RAM
- SSD storage ~12TB

1x "Warm" data node
- 10 Core Intel Gold CPU
- 64GB RAM
- HDD Storage ~48TB

1x Logstash server
- 10 Core Intel Gold CPU
- 32GB RAM
- SSD Storage

1x Kibana LXC container
- 4GB RAM
- Shared Storage

I'd like to scale this out for production to look like:

3x "Hot" data node
- 10 Core Intel Gold CPU
- 64GB RAM
- SSD storage ~12TB

3x "Warm" data node
- 10 Core Intel Gold CPU
- 64GB RAM
- HDD Storage ~48TB

3x Master nodes server
- Quad Core Intel CPU
- 16GB RAM
- HDD Storage

2x Logstash server
- 10 Core Intel Gold CPU
- 32GB RAM
- SSD Storage

My first question is this; Does the above look appropriate for a cluster ingesting around 20-30GB per day, with the view to expanding this to include system metrics, audit logs and heartbeat checks? I appreciate this is a difficult question to answer so I'm approaching this from the perspective of data ingestion rate vs hardware resource rather than storage capacity.

Secondly I am considering adding 3x Coordinator nodes which will also run Kibana pointed to localhost. These will sit behind HAProxy loadbalancers to provide load balancing and high availability.

My understanding is that this will allow the "Scatter" and "Gather" phase to be performed on the coordinator nodes rather than a nominated data node. Am I correct in this assumption that Kibana would only be able to point to a single data node to perform queries without the coordinator?

My final question is this; what kind of resources should I be looking at for the coordinator nodes in relating to the rest of the stack? My thinking was something slightly more powerful that the Master nodes but not as "beefy" as the data nodes.

Appreciate any feedback
Steve

Only benchmarking with a realistic workload can say for sure, but my initial impression is that this sounds like a lot of power for the load you describe. Each of your SSD-based "hot" nodes already has enough storage for a year's worth of data at that rate. Do you need the "warm" tier at all? Do you need three data nodes or would two be enough?

You focussed on ingest load rather than search. Does this mean you expect the search load to be relatively light? If so, again, 3x coordinating nodes sounds excessive. Perhaps you can just search on the data nodes directly (via your load balancer)? Again, only benchmarking with a realistic workload can say for sure.

It sounds like you can put a load balancer between your Kibana nodes and the Elasticsearch nodes, although the recommended architecture is to point each Kibana at a single node.

They won't need much disk but will use a decent amount of RAM if you are going to hit them hard with lots of heavy aggregations or other expensive searches.

Thank you for your detail reply @DavidTurner

Do you need the "warm" tier at all? Do you need three data nodes or would two be enough?

The Hot/Warm architecture was already in place when I took over this project, however based on your recommendations I will look at this again. It does certainly appear to be excessive on reflection. I'm considering a three node setup for the following two reasons:

  1. I'm thinking about using a RAID 0 array for Elasticsearch data configured with a replication factor of 2. I've based this on somewhat old documentation but the logic still seems valid: Hardware | Elasticsearch: The Definitive Guide [2.x] | Elastic
  2. We have an "N+2" philosophy for all production kit, unless there is a good reason not to do this I'm inclined to keep this the same. We could absolutely start with two nodes however and then scale up, my concern then would be not wanting to run a RAID 0 array.

I think again the logic of N+2 comes into play with regard to 3 nodes. I expect the search load to be light however the types of search will usually involves massive amounts of results of large date ranges. Think along the lines of number of times an IP has hit our servers over a 3 month period. (We average ~600 requests per second through our loadbalancers).

I did consider this approach however I thought for a relatively low spend we would get the added benefit of "smart loadbalancing" (Scatter/Gather benefits) with coordinator nodes which we wouldn't get with HAProxy.

Thank you for this, I'll keep this in mind.

Once again, appreciate your input :+1:

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.