ES Cluster stops responding after trying to load data

cookandy · September 25, 2016, 2:42am

Hello,

I am testing ES with a very simple 2-node cluster. Upon startup, everything looks good.

Similarly, when I query the health of the cluster on either node, it returns green:

Node 1:

> curl 'http://10.138.160.210:31972/_cluster/health?pretty'                                                                                                                                                     
{
  "cluster_name" : "es-cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 2,
   ....
  "active_shards_percent_as_number" : 100.0
}

Node 2 returns the same.

However, as soon as I try to load sample data, it just hangs.

I am loading data with:

curl -XPOST 'http://10.138.96.56:31212/bank/account/_bulk?pretty' --data-binary "@accounts.json"

But the curl just hangs indefinitely and never finishes. All I see in the ES logs during this data load is:

[2016-09-25 02:24:14,049][INFO ][cluster.metadata         ] [elastic-search-31212] [bank] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [account]
[2016-09-25 02:24:14,724][INFO ][cluster.routing.allocation] [elastic-search-31212] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[bank][4], [bank][4]] ...]).
[2016-09-25 02:24:14,781][INFO ][cluster.metadata         ] [elastic-search-31212] [bank] update_mapping [account]

And when I look at the cluster status, it's no longer green.

The curl never completes and the cluster health never returns to green. Any ideas what is happening? I'm using ES 2.3 running in a docker container.

Thanks.

cookandy · September 25, 2016, 2:42am

Here are the startup logs (couldn't include them in the original post due to character limit):

Node 1 startup log

[2016-09-25 02:20:56,246][WARN ][bootstrap                ] unable to install syscall filter: seccomp unavailable: your kernel is buggy and you should upgrade
[2016-09-25 02:20:56,411][INFO ][node                     ] [elastic-search-31972] version[2.3.5], pid[11], build[90f439f/2016-07-27T10:36:52Z]
[2016-09-25 02:20:56,411][INFO ][node                     ] [elastic-search-31972] initializing ...
[2016-09-25 02:20:56,965][INFO ][plugins                  ] [elastic-search-31972] modules [reindex, lang-expression, lang-groovy], plugins [], sites []
[2016-09-25 02:20:56,991][INFO ][env                      ] [elastic-search-31972] using [1] data paths, mounts [[/usr/share/elasticsearch/data (/dev/vda1)]], net usable_space [67.8gb], net total_space [78.6gb], spins? [possibly], types [ext4]
[2016-09-25 02:20:56,992][INFO ][env                      ] [elastic-search-31972] heap size [990.7mb], compressed ordinary object pointers [true]
[2016-09-25 02:20:58,956][INFO ][node                     ] [elastic-search-31972] initialized
[2016-09-25 02:20:58,957][INFO ][node                     ] [elastic-search-31972] starting ...
[2016-09-25 02:20:59,033][INFO ][transport                ] [elastic-search-31972] publish_address {10.138.160.210:31973}, bound_addresses {[::]:9300}
[2016-09-25 02:20:59,038][INFO ][discovery                ] [elastic-search-31972] es-cluster/uA9nSCnnRxapmen24AaT5Q
[2016-09-25 02:21:29,041][WARN ][discovery                ] [elastic-search-31972] waited for 30s and no initial state was set by the discovery
[2016-09-25 02:21:29,052][INFO ][http                     ] [elastic-search-31972] publish_address {10.138.160.210:9200}, bound_addresses {[::]:9200}
[2016-09-25 02:21:29,052][INFO ][node                     ] [elastic-search-31972] started
[2016-09-25 02:21:29,296][INFO ][cluster.service          ] [elastic-search-31972] detected_master {elastic-search-31212}{Gw-EirQpTduPgnwM2Ll18A}{10.138.96.56}{10.138.96.56:31213}, added {{elastic-search-31212}{Gw-EirQpTduPgnwM2Ll18A}{10.138.96.56}{10.138.96.56:31213},}, reason: zen-disco-receive(from master [{elastic-search-31212}{Gw-EirQpTduPgnwM2Ll18A}{10.138.96.56}{10.138.96.56:31213}])

Node 2 startup log

[2016-09-25 02:20:56,326][WARN ][bootstrap                ] unable to install syscall filter: seccomp unavailable: your kernel is buggy and you should upgrade
[2016-09-25 02:20:56,491][INFO ][node                     ] [elastic-search-31212] version[2.3.5], pid[10], build[90f439f/2016-07-27T10:36:52Z]
[2016-09-25 02:20:56,492][INFO ][node                     ] [elastic-search-31212] initializing ...
[2016-09-25 02:20:57,092][INFO ][plugins                  ] [elastic-search-31212] modules [reindex, lang-expression, lang-groovy], plugins [], sites []
[2016-09-25 02:20:57,120][INFO ][env                      ] [elastic-search-31212] using [1] data paths, mounts [[/usr/share/elasticsearch/data (/dev/vda1)]], net usable_space [67.9gb], net total_space [78.6gb], spins? [possibly], types [ext4]
[2016-09-25 02:20:57,120][INFO ][env                      ] [elastic-search-31212] heap size [990.7mb], compressed ordinary object pointers [true]
[2016-09-25 02:20:59,114][INFO ][node                     ] [elastic-search-31212] initialized
[2016-09-25 02:20:59,115][INFO ][node                     ] [elastic-search-31212] starting ...
[2016-09-25 02:20:59,220][INFO ][transport                ] [elastic-search-31212] publish_address {10.138.96.56:31213}, bound_addresses {[::]:9300}
[2016-09-25 02:20:59,225][INFO ][discovery                ] [elastic-search-31212] es-cluster/Gw-EirQpTduPgnwM2Ll18A
[2016-09-25 02:21:29,228][WARN ][discovery                ] [elastic-search-31212] waited for 30s and no initial state was set by the discovery
[2016-09-25 02:21:29,255][INFO ][http                     ] [elastic-search-31212] publish_address {10.138.96.56:9200}, bound_addresses {[::]:9200}
[2016-09-25 02:21:29,255][INFO ][node                     ] [elastic-search-31212] started
[2016-09-25 02:21:29,266][INFO ][cluster.service          ] [elastic-search-31212] new_master {elastic-search-31212}{Gw-EirQpTduPgnwM2Ll18A}{10.138.96.56}{10.138.96.56:31213}, added {{elastic-search-31972}{uA9nSCnnRxapmen24AaT5Q}{10.138.160.210}{10.138.160.210:31973},}, reason: zen-disco-join(elected_as_master, [1] joins received)
[2016-09-25 02:21:29,390][INFO ][gateway                  ] [elastic-search-31212] recovered [0] indices into cluster_state

kstaken · September 26, 2016, 8:57pm

How big is that accounts.json file? Bulk requests need to be reasonably sized so if it's large you may have to break it up.

Kimbro

cookandy · September 26, 2016, 9:38pm

Hi @kstaken. The accounts.json file is just the sample one from the ES website and is only 245k in size.

After further inspection, this appears to be a problem with IPSec. I'm guessing it has something to do with the MTU, as I am able to load smaller files without a problem. And the issue only occurs after building an ES cluster. I can load the accounts.json file with a single server, no problem.

I have opened a Github Issue as I think this is a problem with the size of the packets going between the ES nodes. Even though I've configured iptables to force the MTU to 1460 (to account for IPSec overhead), I still have this problem.

I've posted more details on the GH issue page here:

Any help you can provide would be awesome!

Topic		Replies	Views
Cluster hangs when closing index and shows strange behavior Elasticsearch	5	388	July 6, 2017
Cluster has become unresponsive Elasticsearch	9	1216	February 21, 2019
Hung node, cluster state green Elasticsearch	6	1133	July 6, 2017
ES cluster become red Elasticsearch	3	314	July 6, 2017
Cluster hangs for 1h. no logs, no throughput Elasticsearch	7	1272	July 24, 2017

ES Cluster stops responding after trying to load data

Related topics