I'm trying to setup a production ELK instance in AWS. Any guidelines for which instance type to use for the different types on nodes? I want to have the following configuration.
3 master Elasticsearch nodes (CPU and memory intensive?)
2 data Elasticsearch nodes (storage intensive?)
2 client Elasticsearch nodes (CPU and memory intensive?)
1 Kibana+Logstash node. Kibana taks to the client nodes, Logstash talks to the data nodes. (IO intensive?)
The stack will
gather logs from multiple application servers on the same AWS region, in real time
will have multiple users logging into Kibana concurrently to do queries
have indices open for 30 days; older indices will be closed
retain indices for 90 days; older indices will be curated.
I know this isn't the answer you want, but the real answer is "it depends." It depends on things like how many indices and shards you need to index/query, how much data you plan to push in per-day/week/month, and what your query profiles look like.
First of all, good job in considering dedicated master-eligible nodes. You're a step ahead of things! Master-eligible nodes tend to be memory sensitive (you really don't want them to run out of heap!) but at the same time, you don't want to set the heap too large, as it can lead to long garbage collection, which can cause real cluster problems. If you plan to have a very large number of shards, a very large number of indices, very large mappings, or other things in the cluster state you'll need more heap on master-eligible nodes. Still, we usually don't see most master-eligible nodes need a huge heap. They don't store a lot on disk.
Data nodes are writing and reading data, so things like SSDs tend to help with throughput on both sides if you need it. They also process aggregations, so heap can be important.
As to dedicated client nodes, we don't often see people actually need them. Sometimes, but not always -- especially on small setups. You may want to consider not including them to start and only add them if/when you get to that point.
You can point Kibana to any of the data nodes, which will then act as a coordinating/client node for its requests. If you want to set up a dedicated client/coordinating node, you can: it's just not always necessary. If you do, you could even consider putting that coordinating node on the same as the Kibana host.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.