We have 3 nodes. nodes are not coming back up after the restart. I get following errors
nested: RecoverFilesRecoveryException[Failed to transfer [0] files with total size of [0b]]; nested: IllegalStateException[try to recover [csv3][1] from primary shard with sync id but number of docs differ: 471350 lnode01, primary) vs 471351(node2)]; ]
aused by: RemoteTransportException[[node2][172.16.168.93:9300][internal:index/shard/recovery/start_recovery]]; nested: RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [0] files with total size of [0b]]; nested: IllegalStateException[try to recover [csv3][1] from primary shard with sync id but number of docs differ: 471350 (node01, primary) vs 471351 node2)];
Caused by: [csv3][[csv3][1]] RecoveryEngineException[Phase[1] phase1 failed]; nested: RecoverFilesRecoveryException[Failed to transfer [0] files with total size of [0b]]; nested: IllegalStateException[try to recover [csv3][1] from primary shard with sync id but number of docs differ: 471350 (node01, primary) vs 471351(node2];
at org.elasticsearch.indices.recovery.RecoverySourceHandler.recoverToTarget(RecoverySourceHandler.java:135)
Any idea how to fix this [lease?
can you try
/_cluster/health?pretty is there any unassigned_shards?
Yes.
curl -XGET 'http://localhost:9200/_cluster/health?pretty=true'
{
"cluster_name" : "ixxx",
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 3,
"active_shards" : 5,
"relocating_shards" : 0,
"initializing_shards" : 2,
"unassigned_shards" : 2,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 55.55555555555556
}
After a while I see unassigned shards as 0. but initializing shards as 3.
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"active_primary_shards" : 3,
"active_shards" : 6,
"relocating_shards" : 0,
"initializing_shards" : 3,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 66.66666666666666
If it's initializing then i guess the only thing you can try is restarting the nodes
Please see this blog
https://t37.net/how-to-fix-your-elasticsearch-cluster-stuck-in-initializing-shards-mode.html
To avoid this situation this is the permanent solution