How is this hardware selection for 3 node cluster

Hi All,

We currently run a production 3 node cluster internally at our company on a VERY resource constrained server. We run it under a docker swarm, and have all or our services etc.. also within the swarm, so the 3 hosts will have about 200 or so containers across them. Some containers are replicated, but most are single containers with a single purpose for data collection.

We are now looking to expand and upgrade this to a proper set of separated hosts.

I have been tasked with designing and ordering and building the systems, so am coming here to get some opinions on the build. (This will also be a 3 node cluster)

CPU: Ryzen 5 3600
RAM: 32GB (2 x 16GB sticks)
MOBO: Gigabyte B550M K
Storage: 1TB NVME

Our data ingestion isnt HUGE, our current cluster with ~1 year of data is less than 300GB overall across all 3 nodes ( think, more on that below).

I have designed this setup to allow 200% increse in current ingestion. We only realy need to retain 1 year of data. And we use it for reporting (we provide 12 month history in the reports, hence the 1YR retention). But thats about it.

In relation to indexes, we have 1 index per service per customer. currently the indexing is a total mess with about 300 (due to a badly coded service that created daily indexes for each service for each custoemr and was missed. This will be corrected with the new cluster and all those documents will be merged into the correct index. We have about 400MM documents and the Store size is 250gb and the pri. store is about 125gb in total.

Nearly everything in the cluster is set with 3 primary shards, and 1 replica shard. although, i think that i would move that to having 2 replicas, then all the data is fully spread across all nodes. Which i think leads to faster searching.

Suggestions or opinions? or anything that i should look at to provide some more info?



So you will have 3 of these proposed hosts? Or 3 nodes on the 1 host?

There will be 3 physical hosts with those specs, and 1 node on each. Unless there is something to be gained by running more nodes on each. But the main thing we wanted is to minimise resource constraints between them. So each node has its own storage.

I have run some tests and over 3 hours, we ingested 3000 documents / minute. to give you an idea of our load. It not alot currently, but we are wanting this to be a production system for many years to come, so wanting to build it to suit.

Given you are running on consumer level hardware that's a bit at odds with this. Chances are your requirements in 2 years will be different anyway!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.