Not possible to increase node count on ES Cloud

While configuring a cluster on ES Cloud (aws.data.highio.i3, I/O optimized Elasticsearch instance running on an AWS i3), I noticed that it is not possible to increase the node count unless reaching the maximum amount of RAM (58GB).

To me, having one big node with 30GB of heap is not at all equivalent to having 4 nodes with 8GB of heap. With one node, all requests can only be handled by that single node and there's no way to scale out. Some use cases do not necessarily require huge boxes, but a few smaller ones would fit perfectly as well.

One node, however big it is, will only have limited CPU and network bandwidth, while using more nodes can provide more CPU throughput and allow to parallelize requests a bit more.

I'm really curious to know the rationale behind this. Thanks in advance for sharing your insights.

@val - thanks for the feedback, on an interesting question that gets a fair bit of internal discussion.

Historically Cloud provided a "one size fits all" type approach for reasons of simplicity, and our empirical findings back then suggested that for most cases maxing out heap size was preferable.

Our recent move to template-based deployments does open the door to alternative node sizing, though it's not something we're actively looking at currently - we're always open to feedback on this! Out of interest, do you have any benchmarking you use to decide between (eg) 1x30GB vs 4x 8GB?

The two specific considerations you mentioned, CPU and network bandwidth, aren't believed to be issues in practice - eg regardless of node size, each ES process runs on a large multi-tenant server with resources carved out using cgroups, and ES scales the thread pool sizing automatically.

(In our "on-prem" version of Cloud, customers are more flexible with node sizing, though - again empirically - this seems be related to logistics or internal billing considerations rather than performance)

Alex

2 Likes

You are able to deploy multiple smaller nodes by choosing to use more than one availability zone. This does not give you full flexibility, but allows you to run 2 or 3 smaller nodes in a cluster.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.