One failed data node cause http connection to master node (6 data nodes) disconnected

lhong1 · September 21, 2017, 10:29pm

Hi,

Trying to setup ES cluster on EC2 instance directly without using AWS managed Elasticsearch cluster.

So I setup a ES cluster running 1 dedicated master node and 6 dedicated data node (all on EC2 M4.large instance w/ 2 vCPU and 8G RAM).

Then I took snapshot of an index (1.2M docs, ~200G, 40 shards / replicas x 1.) from AWS ES cluster and restored to my own EC2 cluster, with all default settings except "index.unassigned.node_left.delayed_timeout": "10m".

Then I run a python script to use bulk API and scroll to re-index such index into a new index on the same cluster, using the master node as the end point.

Previous this index is on an ES cluster via AWS elasticsearch service with 8 T2.medium instance and took about 8 hours to finish without any problem.

However, using my own cluster, I run into two issues:

a data node will always die due to OOM / Heap size issue;
once this happen, my python script will die shortly due to connection timeout (but not my master node never died);

So my question is:

When one data node died, should the master node automatically stop sending traffic to this failed node, since from the master ES log it clearly detected and had such node removed?

I am less concerned about the OOM error since my design is to automatically scale up by adding new data node to the cluster due to increased CPU load on the rest data nodes.

Thanks,

system · October 19, 2017, 10:30pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Handling node failure in ES cluster Elasticsearch	3	2278	July 6, 2017
All data nodes died on the cluster Elasticsearch	7	1567	April 17, 2019
ElasticSearch on EC2 - runs into problem recovering when one of the nodes times out then recovers Elasticsearch	2	346	July 6, 2017
Master node hangs when multiple data nodes are shutdown at the same time Elasticsearch	6	954	July 6, 2017
Random data node disconnections on AWS Elasticsearch	1	515	March 14, 2017

One failed data node cause http connection to master node (6 data nodes) disconnected

Related topics