ES Cluster stops responding after trying to load data


(Cookandy) #1

Hello,

I am testing ES with a very simple 2-node cluster. Upon startup, everything looks good.

Similarly, when I query the health of the cluster on either node, it returns green:

Node 1:

> curl 'http://10.138.160.210:31972/_cluster/health?pretty'                                                                                                                                                     
{
  "cluster_name" : "es-cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 2,
  "number_of_data_nodes" : 2,
   ....
  "active_shards_percent_as_number" : 100.0
}

Node 2 returns the same.

However, as soon as I try to load sample data, it just hangs.

I am loading data with:

curl -XPOST 'http://10.138.96.56:31212/bank/account/_bulk?pretty' --data-binary "@accounts.json"

But the curl just hangs indefinitely and never finishes. All I see in the ES logs during this data load is:

[2016-09-25 02:24:14,049][INFO ][cluster.metadata         ] [elastic-search-31212] [bank] creating index, cause [auto(bulk api)], templates [], shards [5]/[1], mappings [account]
[2016-09-25 02:24:14,724][INFO ][cluster.routing.allocation] [elastic-search-31212] Cluster health status changed from [RED] to [YELLOW] (reason: [shards started [[bank][4], [bank][4]] ...]).
[2016-09-25 02:24:14,781][INFO ][cluster.metadata         ] [elastic-search-31212] [bank] update_mapping [account]

And when I look at the cluster status, it's no longer green.

The curl never completes and the cluster health never returns to green. Any ideas what is happening? I'm using ES 2.3 running in a docker container.

Thanks.


(Cookandy) #2

Here are the startup logs (couldn't include them in the original post due to character limit):

Node 1 startup log

[2016-09-25 02:20:56,246][WARN ][bootstrap                ] unable to install syscall filter: seccomp unavailable: your kernel is buggy and you should upgrade
[2016-09-25 02:20:56,411][INFO ][node                     ] [elastic-search-31972] version[2.3.5], pid[11], build[90f439f/2016-07-27T10:36:52Z]
[2016-09-25 02:20:56,411][INFO ][node                     ] [elastic-search-31972] initializing ...
[2016-09-25 02:20:56,965][INFO ][plugins                  ] [elastic-search-31972] modules [reindex, lang-expression, lang-groovy], plugins [], sites []
[2016-09-25 02:20:56,991][INFO ][env                      ] [elastic-search-31972] using [1] data paths, mounts [[/usr/share/elasticsearch/data (/dev/vda1)]], net usable_space [67.8gb], net total_space [78.6gb], spins? [possibly], types [ext4]
[2016-09-25 02:20:56,992][INFO ][env                      ] [elastic-search-31972] heap size [990.7mb], compressed ordinary object pointers [true]
[2016-09-25 02:20:58,956][INFO ][node                     ] [elastic-search-31972] initialized
[2016-09-25 02:20:58,957][INFO ][node                     ] [elastic-search-31972] starting ...
[2016-09-25 02:20:59,033][INFO ][transport                ] [elastic-search-31972] publish_address {10.138.160.210:31973}, bound_addresses {[::]:9300}
[2016-09-25 02:20:59,038][INFO ][discovery                ] [elastic-search-31972] es-cluster/uA9nSCnnRxapmen24AaT5Q
[2016-09-25 02:21:29,041][WARN ][discovery                ] [elastic-search-31972] waited for 30s and no initial state was set by the discovery
[2016-09-25 02:21:29,052][INFO ][http                     ] [elastic-search-31972] publish_address {10.138.160.210:9200}, bound_addresses {[::]:9200}
[2016-09-25 02:21:29,052][INFO ][node                     ] [elastic-search-31972] started
[2016-09-25 02:21:29,296][INFO ][cluster.service          ] [elastic-search-31972] detected_master {elastic-search-31212}{Gw-EirQpTduPgnwM2Ll18A}{10.138.96.56}{10.138.96.56:31213}, added {{elastic-search-31212}{Gw-EirQpTduPgnwM2Ll18A}{10.138.96.56}{10.138.96.56:31213},}, reason: zen-disco-receive(from master [{elastic-search-31212}{Gw-EirQpTduPgnwM2Ll18A}{10.138.96.56}{10.138.96.56:31213}])

Node 2 startup log

[2016-09-25 02:20:56,326][WARN ][bootstrap                ] unable to install syscall filter: seccomp unavailable: your kernel is buggy and you should upgrade
[2016-09-25 02:20:56,491][INFO ][node                     ] [elastic-search-31212] version[2.3.5], pid[10], build[90f439f/2016-07-27T10:36:52Z]
[2016-09-25 02:20:56,492][INFO ][node                     ] [elastic-search-31212] initializing ...
[2016-09-25 02:20:57,092][INFO ][plugins                  ] [elastic-search-31212] modules [reindex, lang-expression, lang-groovy], plugins [], sites []
[2016-09-25 02:20:57,120][INFO ][env                      ] [elastic-search-31212] using [1] data paths, mounts [[/usr/share/elasticsearch/data (/dev/vda1)]], net usable_space [67.9gb], net total_space [78.6gb], spins? [possibly], types [ext4]
[2016-09-25 02:20:57,120][INFO ][env                      ] [elastic-search-31212] heap size [990.7mb], compressed ordinary object pointers [true]
[2016-09-25 02:20:59,114][INFO ][node                     ] [elastic-search-31212] initialized
[2016-09-25 02:20:59,115][INFO ][node                     ] [elastic-search-31212] starting ...
[2016-09-25 02:20:59,220][INFO ][transport                ] [elastic-search-31212] publish_address {10.138.96.56:31213}, bound_addresses {[::]:9300}
[2016-09-25 02:20:59,225][INFO ][discovery                ] [elastic-search-31212] es-cluster/Gw-EirQpTduPgnwM2Ll18A
[2016-09-25 02:21:29,228][WARN ][discovery                ] [elastic-search-31212] waited for 30s and no initial state was set by the discovery
[2016-09-25 02:21:29,255][INFO ][http                     ] [elastic-search-31212] publish_address {10.138.96.56:9200}, bound_addresses {[::]:9200}
[2016-09-25 02:21:29,255][INFO ][node                     ] [elastic-search-31212] started
[2016-09-25 02:21:29,266][INFO ][cluster.service          ] [elastic-search-31212] new_master {elastic-search-31212}{Gw-EirQpTduPgnwM2Ll18A}{10.138.96.56}{10.138.96.56:31213}, added {{elastic-search-31972}{uA9nSCnnRxapmen24AaT5Q}{10.138.160.210}{10.138.160.210:31973},}, reason: zen-disco-join(elected_as_master, [1] joins received)
[2016-09-25 02:21:29,390][INFO ][gateway                  ] [elastic-search-31212] recovered [0] indices into cluster_state

(Kimbro Staken) #3

How big is that accounts.json file? Bulk requests need to be reasonably sized so if it's large you may have to break it up.

Kimbro


(Cookandy) #4

Hi @kstaken. The accounts.json file is just the sample one from the ES website and is only 245k in size.

After further inspection, this appears to be a problem with IPSec. I'm guessing it has something to do with the MTU, as I am able to load smaller files without a problem. And the issue only occurs after building an ES cluster. I can load the accounts.json file with a single server, no problem.

I have opened a Github Issue as I think this is a problem with the size of the packets going between the ES nodes. Even though I've configured iptables to force the MTU to 1460 (to account for IPSec overhead), I still have this problem.

I've posted more details on the GH issue page here:

Any help you can provide would be awesome!


(system) #5