Elasticsearch version: "version" :
{ "number" : "1.4.1", "build_hash" : "89d3241d670db65f994242c8e8383b169779e2d4", "build_timestamp" : "2014-11-26T15:49:29Z", "build_snapshot" : false, "lucene_version" : "4.10.2" },
JVM version:
> java version "1.8.0_40"
> Java(TM) SE Runtime Environment (build 1.8.0_40-b25)
> Java HotSpot(TM) 64-Bit Server VM (build 25.40-b25, mixed mode)
OS version:> Linux Ubuntu
Description of the problem including expected versus actual behavior:
When the node leaves the cluster, a lot of shards are stuck at initializing state.
Steps to reproduce:
1. Disable allocation
2. Stop elasticsearch
3. enable allocation
Provide logs (if relevant):
Here is the response of the _cat/pending_tasks | head
> 01439777 56.5m URGENT shard-started ([error-newsflickss][0], node[xirsEXbZSpqldjylVrwzjw], [R], s[INITIALIZING]), reason [after recovery (replica) from node [[30GB_1TB_ComputeNodeNew13][4DtklsTUSd-eRhPG_c3uyw][ip-172-31-37-168][inet[/172.31.37.168:9300]]{master=false}]]
> 1439778 56.5m URGENT shard-started ([firstcrytest-notificationclickedmoe][0], node[xirsEXbZSpqldjylVrwzjw], [R], s[INITIALIZING]), reason [after recovery (replica) from node [[ES_r3_xlarge_1TB_Node_new15][sbwLxGR6RpmFHmHv3Vc6ag][ip-172-31-47-74][inet[/172.31.47.74:9300]]{master=false}]]
> 1439779 56.5m URGENT shard-started ([cleartrip-eamgc][0], node[Nm_bLbaGQWC0u0ofkIjWIw], [R], s[INITIALIZING]), reason [after recovery (replica) from node [[ES_r3_xlarge_1TB_Node_2][xirsEXbZSpqldjylVrwzjw][ip-172-31-41-73][inet[/172.31.41.73:9300]]{master=false}]]
> 1439783 56.5m URGENT shard-started ([emt-uat-fundtransfer][0], node[Nm_bLbaGQWC0u0ofkIjWIw], [R], s[INITIALIZING]), reason [after recovery (replica) from node [[30GB_1TB_ComputeNode11][jpNMJUlwQ7aQKk-hlcWLVQ][ip-172-31-40-207][inet[/172.31.40.207:9300]]{master=false}]]
> 1439822 56.4m URGENT shard-started ([firstcrytest-notificationclickedmoe][0], node[xirsEXbZSpqldjylVrwzjw], [R], s[INITIALIZING]), reason [master [ESMasterNode3][8E9mg0rZSHKITLFvvDDT2g][ip-172-31-46-130][inet[/172.31.46.130:9300]]{data=false, master=true} marked shard as initializing, but shard state is [POST_RECOVERY], mark shard as started]
> 1439780 56.5m URGENT shard-started ([chillr-requestshowqr][0], node[xirsEXbZSpqldjylVrwzjw], [R], s[INITIALIZING]), reason [after recovery (replica) from node [[30GB_1TB_ComputeNode11][jpNMJUlwQ7aQKk-hlcWLVQ][ip-172-31-40-207][inet[/172.31.40.207:9300]]{master=false}]]
> 1439799 56.4m URGENT shard-started ([sdsellerzone-catalogpdpback][0], node[Nm_bLbaGQWC0u0ofkIjWIw], [R], s[INITIALIZING]), reason [master [ESMasterNode3][8E9mg0rZSHKITLFvvDDT2g][ip-172-31-46-130][inet[/172.31.46.130:9300]]{data=false, master=true} marked shard as initializing, but shard state is [POST_RECOVERY], mark shard as started]
> 1439833 56.4m URGENT reroute_after_cluster_update_settings
> 1439824 56.4m URGENT shard-started ([cleartripprod-fbtan][0], node[xirsEXbZSpqldjylVrwzjw], [R], s[INITIALIZING]), reason [master [ESMasterNode3][8E9mg0rZSHKITLFvvDDT2g][ip-172-31-46-130][inet[/172.31.46.130:9300]]{data=false, master=true} marked shard as initializing, but shard state is [POST_RECOVERY], mark shard as started]
> 1439288 5.9h HIGH refresh-mapping [cleartripprod-apppl-2016-06-30][[datapoints]]
_cat/health
epoch timestamp cluster status node.total node.data shards pri relo init unassign
1468455843 00:24:03 DataPointsCluster yellow 22 19 26619 14017 0 20 1395
_cluster/settings
{
"persistent": {
"cluster": {
"routing": {
"allocation": {
"cluster_concurrent_rebalance": "10",
"node_concurrent_recoveries": "14",
"node_initial_primaries_recoveries": "4",
"enable": "all"
}
}
},
"threadpool": {
"bulk": {
"keep_alive": "2m",
"size": "16",
"queue_size": "2000",
"type": "fixed"
}
},
"indices": {
"recovery": {
"concurrent_streams": "6",
"max_bytes_per_sec": "120mb"
}
}
},
"transient": {
"cluster": {
"routing": {
"allocation": {
"node_initial_primaries_recoveries": "10",
"balance": {
"index": "0.80f"
},
"enable": "all",
"allow_rebalance": "indices_all_active",
"cluster_concurrent_rebalance": "0",
"node_concurrent_recoveries": "5",
"exclude": {
"_ip": "172.31.39.58"
}
}
}
},
"indices": {
"recovery": {
"concurrent_streams": "10"
}
}
}
}
Is this because of too many shards?