Data node not getting shards automatically

I have an elasticsearch cluster with the following configuration

All versions are ES 6.4.2

es1 - coordinating node only (also runs logstash and kibana)
es2 - master node, data node, ingest node
es3 - master node, data node, ingest node
es4 - master node, data node, ingest node
es5 - data node only

es1 just handles the interface into the cluster, it does its job just fine.

es2/3/4 work properly, shards get allocated and move around to balance.

I brought up es5 to act as a data node (with plans to bring up more shortly ), but I've been running it all day and the cluster doesn't seem to be load balancing to it.
Shard allocation is turned on*
I can manually allocate a shard to it and it sticks.

The cluster holds daily indexes of about 200GB/day in 5 shards and then 1 replica.

I put two shards on it and they are still there as primaries.
Otherwise, I did a rolling reboot of the other nodes and for some reason it put all replica shards for today's (and only today's) daily index on it after the es2, then es3 rebooted. Then took over as primary shards for today's index when es4 rebooted. I have 30 days of daily indices and it hasn't moved any other shards over. Out of ~150 shards es5 is holding 7 (the 2 I manually placed there and the 5 primaries for today's daily index).

Am I misunderstanding something about data nodes? Are they are not also supposed to hold data and be part of the balancing of the cluster as well?


Update, if you can help figure out what happened, though I've destroyed the environment...

I fixed the problem though I don't know how or why...

es3 had a disk die on us last week. It was replaced and brought back up. Everything seemed fine, but I failed to notice the searchguard index was in a yellow state as it didn't like es3 for some reason, though the other indices were doing fine.

I fired up cerebro and saw a searchguard replica in an unallocated state and not present on es3.
I couldn't get it do anything (even tried deleting the whole searchguard index and reinstalling with sgadmin). So I shutdown es3, waiting for all the data to move off, fired up es3 as a coordinating node (data/master/ingest all set to false) let it join the cluster. Searchguard fixed itself. Brought es3 back down and switched back to true for data/ingest/master and brought it back up and now everything is working fine. The cluster is now currently balancing to es5 and everything is green.

Something probably got stuck somewhere, but it's working now. I just wrote it all for reference.


This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.