One more thing i want to ask... Is there will be any issues when node restarts and rebalance is happening, active traffic is going like we try to write to index being recovered(i want to know from ES side, i know client will fail to write thats okay, but any problem at server side)
Looking at just one unassigned shard, [logs-2018.12.04.07][1], I see this:
[2018-12-04 08:14:03,184][TRACE][gateway ] [metrics-master-2] [[logs-2018.12.04.07][1], node[null], [P], v[0], s[UNASSIGNED], unassigned_info[[reason=CLUSTER_RECOVERED], at[2018-12-04T08:13:54.398Z]]] on node [\{metrics-datastore-2\}\{qQ995p5ERmS0O5o7yK3VtA\}\{192.168.13.70\}\{192.168.13.70:9300\}\{max_local_storage_nodes=1, master=false\}] has version [-1] of shard\
[2018-12-04 08:14:03,184][TRACE][gateway ] [metrics-master-2] [[logs-2018.12.04.07][1], node[null], [P], v[0], s[UNASSIGNED], unassigned_info[[reason=CLUSTER_RECOVERED], at[2018-12-04T08:13:54.398Z]]] on node [\{metrics-datastore-1\}\{C279DcEfRDeqr2wDgJF5bQ\}\{192.168.13.214\}\{192.168.13.214:9300\}\{max_local_storage_nodes=1, master=false\}] has version [6] of shard\
[2018-12-04 08:14:03,184][TRACE][gateway ] [metrics-master-2] [[logs-2018.12.04.07][1], node[null], [P], v[0], s[UNASSIGNED], unassigned_info[[reason=CLUSTER_RECOVERED], at[2018-12-04T08:13:54.398Z]]] on node [\{metrics-datastore-0\}\{ZxDL21BbStCXGRD2GVieNA\}\{192.168.13.17\}\{192.168.13.17:9300\}\{max_local_storage_nodes=1, master=false\}] has version [-1] of shard\
[2018-12-04 08:14:03,184][TRACE][gateway ] [metrics-master-2] [logs-2018.12.04.07][1], node[null], [P], v[0], s[UNASSIGNED], unassigned_info[[reason=CLUSTER_RECOVERED], at[2018-12-04T08:13:54.398Z]] candidates for allocation: [[metrics-datastore-1] -> 6, ]\
[2018-12-04 08:14:03,184][DEBUG][gateway ] [metrics-master-2] [logs-2018.12.04.07][1] found 1 allocations of [logs-2018.12.04.07][1], node[null], [P], v[0], s[UNASSIGNED], unassigned_info[[reason=CLUSTER_RECOVERED], at[2018-12-04T08:13:54.398Z]], highest version: [6]\
[2018-12-04 08:14:03,184][DEBUG][gateway ] [metrics-master-2] [logs-2018.12.04.07][1]: not allocating, number_of_allocated_shards_found [1]\
I think this tells us that the master is looking for at least two copies of this shard, but only found one, on metrics-datastore-1. There should have been copies on one of other two data nodes, but it looks like they were unassigned before/during the shutdown of the cluster.
But in my setup at beginning there always shards are being assigned and started before i restart. According to you this could happen in situation where new index being created and shards are yet to allocate.
before restarting I always ensured cluster is in green state, 100% active shards number and all are in assigned state there are no unassigned shards.
And yet according to the logs they're not there when the nodes come back after the restart. Apart from upgrading I don't really know what else to suggest. Maybe someone else can help.
We are trying to figure out what could be the root cause, if we ca not fix using ES 2.4.
Last question i have is: we run ES nodes as kubernetes containers. When they restart the ips of each container will change. and unicast address also will change, which we are updating as part of pod restarts. So when the ES cluster comes up it comes with new unicast ips.
we made a setup in aws without using kubernetes containers, 6 node syatem, created some indices and restarted all of them together. Here IPs doesn't change they are elatic IPs.
So does this may matter in any case? this is the final suspect which we have.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.