Would it be recommended to place a Autoscaling Group and a LoadBalancer (ALB) b/w the Data Nodes in AWS ? What could be the pro's and cons of this approach.
No. It's not recommended.
You should control that manually IMHO.
Adding a new node is an important decision. Specifically if you have a huge volume of data.
Thanks dadoonet !! What would be a deciding factor in determining the number of Data Nodes ?
In my opinion, number of shards should be the major deciding factor for the number of data nodes. As a rule of thumb, there should be around 20-25 shards per 1GB heap. There is a max heap limit for a data node which is approx 32G (https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html#compressed_oops). So typically, you shouldn't exceed beyond 800 shards on a single data node.
In cases where you have larger data volume requirement, you can expand the attached disk volumes.
In addition to the previous answer, may I suggest you look at the following resources about sizing:
https://www.elastic.co/elasticon/conf/2016/sf/quantitative-cluster-sizing
And https://www.elastic.co/webinars/using-rally-to-get-your-elasticsearch-cluster-size-right
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.