Also I see that Instance storage is recommended by Elastic for Data nodes and as per the above I-series is recommended. But then again the I-series have EBS attached, won't this cause performance issues like slowdown against using EBS directly ?
Also it is mentioned that the Data Nodes are scaled horizontally. is it scaled using Auto-scaling Groups ? If not what would be good option to scale the data nodes as Data increases ?
Do I really need coordinating Nodes, if so could I just install it on Kibana EC2 Instance or would it be good to have a dedicated EC2 for a coordinating Node ?
EC2 instance sizes depends a lot on your cluster size, however as general rule of thumb following are some of the recommendations.
Master Eligible Nodes: Purpose of master nodes is maintenance of cluster state document. Machines with low CPU, RAM and disk resources are suitable for these roles.
Data Nodes: Stores data and processes client requests. These are the power house of ElasticSearch, so they should be high in CPU, RAM and disk resources. Max limit for heap is around 30G, so you shouldn't provision node with more than 64G of RAM.
Coordinating nodes: Low disk, medium CPU & medium RAM should be used here. These nodes perform gather/reduce phase of search queries.
Using ASG may not be a good option for your data base solution because the data patterns typically is not expected to be of transient nature. Also there are rebalance operations performed within the cluster once nodes joins or leaves the cluster, so nodes should be present in the cluster for a decent amount of time for optimal performance.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.