Many small machines vs fewer large machines

Hi there,

I am building a new heavy throughput logging cluster, receiving about 150-200gb of logs a day acros ~60 indices (though mostly weighted towards 2 or 3 of them).
We currently have a setup with 5 data nodes with 4cpu's a piece and 15gb of memory. I'm trying to figure out what the performance considerations would be for that setup versus one with more, smaller machines (Ex. 10x2cpu and 7.5gb of memory)

Here are the ups and downs I can think of for each topology, based on my current understanding:

More Smaller Machines

  • Assuming your shard count is increased to match the machine count, increased indexing speeds for the same volume of data.
  • Larger cluster state for master nodes to maintain.

Fewer Larger Machines

  • More memory per machine, so less risk of getting out-of-memory errors when trying to load a particularly large shard into memory. We do have some indices with primary shard sizes upwards of 10-20gb, so this is a pertinent consideration.
  • More shards living on the same nodes, can lead to resource competition from many concurrent queries.

These are pretty basic, but are there any other significant things to consider about each approach??

I know topology questions are particular to every use case and that the best way to find out the best choice is through testing, which I plan to do, but I just want to make sure I understand which metrics I am stretching by pushing either approach.

More larger machines could also be an option.

Thank you!

You may be better off combining many of the smaller ones into one index. Indices have a fixed cost and having a bunch of small ones can add up, especially if you have one small one per day.

The upcoming 5.0 has a feature that lets you merge many shards into a single shard so you can more easily get away with scaling out the number of shards.

All node needs to keep a copy of the cluster state around, though they don't deserialize the whole thing.

We're constantly fighting this battle. In 2.x the scariest thing for OOMs is aggregations. In 5.0 we've added a proper circuit breaker for them so they shouldn't OOM, just fail.

The big thing that comes up with topology is usually disk speeds and numbers. If you put two disks in each node the answer is fairly simple: RAID1 the system disk and RAID0 Elasticsearch's data directory. If you have more disks it gets more complex because the mean time between failure of a RAID0 array starts to get uncomfortably common. So you play with things like multiple RAID0 arrays and Elasticsearch's multiple data path feature or RAID10/RAID5/RAID6 with all the usual database-style tradeoffs.

And you get the question of spinning disks vs SSDs. It is fairly common to do a hot/warm thing where some number of your nodes have "hot" setups (SSDs) and some of your nodes are "warm" (spinning disks) and you use allocation filtering to put indexes you are writing to on the "hot" nodes and indexes that you aren't on the "warm" nodes. There are lots of tradeoffs here too like how many of each type of node you have, the disk layouts of each, etc.

The reason we use "warm" for the spinning disk nodes is because the data is still online and query-able. Elasticsearch's mode of operation is to index most of the things all the time so searches can still be reasonably fast, but you get much more contention on disk cache and the disk cache is much more important because of the spinning disk latency.

1 Like