Any guideline with Elasticsearch nodes and AWS instance types?

I'm trying to setup a production ELK instance in AWS. Any guidelines for which instance type to use for the different types on nodes? I want to have the following configuration.

3 master Elasticsearch nodes (CPU and memory intensive?)
2 data Elasticsearch nodes (storage intensive?)
2 client Elasticsearch nodes (CPU and memory intensive?)
1 Kibana+Logstash node. Kibana taks to the client nodes, Logstash talks to the data nodes. (IO intensive?)

The stack will

  • gather logs from multiple application servers on the same AWS region, in real time
  • will have multiple users logging into Kibana concurrently to do queries
  • have indices open for 30 days; older indices will be closed
  • retain indices for 90 days; older indices will be curated.

Thanks!

I know this isn't the answer you want, but the real answer is "it depends." It depends on things like how many indices and shards you need to index/query, how much data you plan to push in per-day/week/month, and what your query profiles look like.

First of all, good job in considering dedicated master-eligible nodes. You're a step ahead of things! Master-eligible nodes tend to be memory sensitive (you really don't want them to run out of heap!) but at the same time, you don't want to set the heap too large, as it can lead to long garbage collection, which can cause real cluster problems. If you plan to have a very large number of shards, a very large number of indices, very large mappings, or other things in the cluster state you'll need more heap on master-eligible nodes. Still, we usually don't see most master-eligible nodes need a huge heap. They don't store a lot on disk.

Data nodes are writing and reading data, so things like SSDs tend to help with throughput on both sides if you need it. They also process aggregations, so heap can be important.

As to dedicated client nodes, we don't often see people actually need them. Sometimes, but not always -- especially on small setups. You may want to consider not including them to start and only add them if/when you get to that point.

But the best advice I can probably give is to watch the video at https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing . This will give you a good framework for how to size and architect your clusters.

@shanec thanks. That was helpful. If I don't have a client node, which node do I point Kibana to?

You can point Kibana to any of the data nodes, which will then act as a coordinating/client node for its requests. If you want to set up a dedicated client/coordinating node, you can: it's just not always necessary. If you do, you could even consider putting that coordinating node on the same as the Kibana host.

@shanec gotcha. If I have a coordinating/client node, should it also be allocated 50% of the memory available, like the master and data nodes?

If I have a coordinating/client node, should it also be allocated 50% of the memory available, like the master and data nodes?

I usually set the heap size to 75% of available RAM on non-data nodes, as they shouldn't need the page cache nearly as much.

For some specifics with regard to instance sizing, I run the following instance types in my clusters:

  • Master Nodes: t2.large
  • Client Nodes: m4.large
  • Data Nodes: r3.2xlarge (data stored on instance storage)
2 Likes

@daveops, thanks!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.