Would dividing the same resources over more nodes improve performance?

A load test seems to show that resources (CPU, RAM) on the data nodes aren't fully used. Would spreading the same resources over more data nodes help performance ?

Elasticsearch performance is not necessarily limited by CPU and/or RAM. The most likely bottleneck is generally disk I/O, but network performance is also a possibility. I have also seen users be unable to saturate clusters as they are not sending data with sufficient level of concurrency.

What is the size and specification of your cluster? What kind of hardware and storage are you using?

I'm new in the project and the topic of sizing/hw is new to me, and not sure I can give exact specs, even if I knew which HW and storage there is, which I don't yet.

In terms of nodes:
10 each master, data and coordinator nodes. Need to add some ingestion nodes.
They all have 2 CPU/node, master and coordinator have 4 GB each, data has 16 GB/node.

Sorry for the vagueness, I'm sharing what information I have right now.

That is a very unusual/strange cluster configuration. You generally want to have exactly 3 dedicated master nodes. Having 10 is excessive and as it is an even number also potentially problematic. Dedicated master nodes do little work as they do not serve requests so 2 CPU and 4GB is plenty. Most clusters do not necessarily need any dedicated coordinating only nodes, so I would consider removing most or all of these. Data nodes need to be more powerful as they do almost all work, so should have more CPU resources allocated than the other node types.

To give you more details from the few I know :slight_smile:

Expected traffic is about 100 million logs/day, with average size of 1 KB/log. Retention policy of 1 year.

Would 3 masters do for such traffic ?
Could 5 or 7 or 9 be needed ?

Since I need to add some ingestion nodes (decision already been made to have dedicated coordinator nodes and ingestion nodes), any recommendation what would be a good ratio between the 2 ?

Dedicated master nodes are not involved in indexing or querying and you should not send requests to these. They just manage the cluster and their quantity is therefore not dependent on the size or load of the cluster. For large clusters they may require more RAM/heap, but the count do not need to be increased. I have seen very large clusters with just 3 dedicated master nodes.

I would remove the coordinating only nodes you have and allocate more resources to the data nodes. The data nodes would handle requests and also be ingest nodes. Just because you can create nodes with dedicated roles does not mean you should.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.