Problems after node restart

Maxwell_Flanders · August 16, 2016, 9:47pm

After a data node in our cluster (3master, 7data, 3client nodes) died recently, it appears from running out of memory due to field data usage, we added two data nodes, and restarted the one that died. After a couple hours, our cluster was approaching green and got down to a single shard. However, that single shard will not seem to finish and is stuck in INITIALIZING mode. It is one of the daily shards dated to today. Is there anything I can do to force this shard onto a node, or restart its initialization process??

Additionally, it seems that our cluster isn't really re-balancing or distributing its load onto the new data nodes. Here is a screenshot of our marvel display, as you can see, the load is extremely unbalanced across our data nodes.

Is this something that should just balance out over time??

warkolm · August 16, 2016, 10:43pm

Check _cat/recovery for that one shard, and _cat/tasks to see if any reallocations are pending.

Maxwell_Flanders · August 16, 2016, 10:51pm

It does show in recovery:

logs-2016.08.16 4 3539791 replica index 10.25.250.37 10.25.250.43 n/a n/a 82 86.6% 7422192764 71.6% 82 7422192764 0 0.0% 27723

and the tasks api call is actually throwing an error....

{
  "error" : {
    "root_cause" : [ {
      "type" : "illegal_argument_exception",
      "reason" : "No feature for name [tasks]"
    } ],
    "type" : "illegal_argument_exception",
    "reason" : "No feature for name [tasks]"
  },
  "status" : 400
}

warkolm · August 16, 2016, 11:14pm

What version are you on?

With the replica, keep an eye on that _cat endpoint as the first and second percentage values should be increasing.

Maxwell_Flanders · August 26, 2016, 2:21am

Just a follow up on this, that one shard DID in fact eventually allocate. It just blew my mind because it took around 3 hours for that one 8G shard to process, which was extremely unprecedented, since the rest of our database (2+ TB) had managed to re-allocate over the course of the day.

It did finish however and our cluster is back together.

Thank you mark! The advice was very helpful.

Topic		Replies	Views
Restarting many nodes Elasticsearch	3	278	July 19, 2018
Cluster stuck in Initializing Elasticsearch	1	2285	July 5, 2017
Elasticsearch not reassigning shards after node rejoined Elasticsearch	24	1544	June 11, 2019
Shard failing after a cluster restart Elasticsearch	1	956	July 5, 2017
Trouble restarting after crash Elasticsearch	4	718	July 6, 2017

Problems after node restart

Related topics