Too many unassigned replica shards everyday

Hello,

We have a ELK cluster setup with 35 nodes (3 masters and 32 data nodes).
New indices get created everyday once.
Approximately, 280 indices are created everyday.
Number of shards per index is 24 with replication factor of 1.
The disk size of each node is 15TB and RAM is 125GB.

The problem is that, everyday after the index creation, the cluster goes to yellow state with too many unassigned indices and takes a lot of time to recover. Following is the state of the cluster after 4.5 hours of new index creation.

{
"cluster_name" : <cluster_name>,
"status" : "yellow",
"timed_out" : false,
"number_of_nodes" : 35,
"number_of_data_nodes" : 32,
"active_primary_shards" : 48589,
"active_shards" : 89434,
"relocating_shards" : 0,
"initializing_shards" : 30,
"unassigned_shards" : 7760,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 1936,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 2036819,
"active_shards_percent_as_number" : 91.98757508434132
}

The reasons for shards being unassigned are the following:

INDEX_CREATED - on newly created indices
NODE_LEFT - on older indices

The following is the output from _cat/allocation/explain api

.
.
.
.
{
"node_id" : "XZAy1ITXSjmO_a-8yulfig",
"node_name" : "node1",
"transport_address" : ":9300",
"node_attributes" : {
"ml.max_open_jobs" : "10",
"ml.enabled" : "true"
},
"node_decision" : "throttled",
"deciders" : [
{
"decider" : "throttling",
"decision" : "THROTTLE",
"explanation" : "reached the limit of incoming shard recoveries [10], cluster setting [cluster.routing.allocation.node_concurrent_incoming_recoveries=10] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
}
]
},
{
"node_id" : "dnB6TExcSOaI_poL3d3t9g",
"node_name" : "node2",
"transport_address" : ":9300",
"node_attributes" : {
"ml.max_open_jobs" : "10",
"ml.enabled" : "true"
},
"node_decision" : "throttled",
"store" : {
"matching_sync_id" : true
},
"deciders" : [
{
"decider" : "throttling",
"decision" : "THROTTLE",
"explanation" : "reached the limit of incoming shard recoveries [10], cluster setting [cluster.routing.allocation.node_concurrent_incoming_recoveries=10] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
}
]
},
{
"node_id" : "BrIKoXg4TkuJ2Ekn8YKuPQ",
"node_name" : "node3",
"transport_address" : ":9300",
"node_attributes" : {
"ml.max_open_jobs" : "10",
"ml.enabled" : "true"
},
"node_decision" : "no",
"store" : {
"matching_sync_id" : true
},
"deciders" : [
{
"decider" : "same_shard",
"decision" : "NO",
"explanation" : "the shard cannot be allocated to the same node on which a copy of the shard already exists [[logstash-hadoop_axonitered-hdfsproxy-audit-s3-2017.12.15][18], node[BrIKoXg4TkuJ2Ekn8YKuPQ], [P], s[STARTED], a[id=OzSdAIySQFWKj7PuzUHbjQ]]"
}
]
}

We don't want the cluster to go into yellow state everyday where thousands of shards go unassigned and takes time to recover. We want the index creation, shard distribution of both primary and replica to be smooth.Would increasing the number of shards per index help. If yes, by how much?

Also, what does NODE_LEFT mean? I do not see any services getting restarted on nodes.

That is waaaaaaaayyyyyyyy too many. You need to reduce that quite dramatically.

the shards set about indices how many? can change the cluster.routing.allocation.node_initial_primaries_recoveries: big number
output the pending_tasks see what pending..

Following is the subset of output from the pending tasks

{
"insert_order" : 180619,
"priority" : "URGENT",
"source" : "shard-started shard id [[logstash-hadoop_radiumtan-oozie-oozie-ops-s3-2017.12.17][5]], allocation id [2k_o8_i9RtO3zF6AaG_6yA], primary term [0], message [after peer recovery]",
"executing" : false,
"time_in_queue_millis" : 12994,
"time_in_queue" : "12.9s"
},
{
"insert_order" : 179771,
"priority" : "HIGH",
"source" : "shard-failed",
"executing" : false,
"time_in_queue_millis" : 438956,
"time_in_queue" : "7.3m"
},
{
"insert_order" : 179758,
"priority" : "HIGH",
"source" : "shard-failed",
"executing" : false,
"time_in_queue_millis" : 461314,
"time_in_queue" : "7.6m"
},
{
"insert_order" : 179772,
"priority" : "HIGH",
"source" : "shard-failed",
"executing" : false,
"time_in_queue_millis" : 438956,
"time_in_queue" : "7.3m"
},
{
"insert_order" : 179757,
"priority" : "HIGH",
"source" : "shard-failed",
"executing" : false,
"time_in_queue_millis" : 461485,
"time_in_queue" : "7.6m"
},
{
"insert_order" : 179751,
"priority" : "HIGH",
"source" : "shard-failed",
"executing" : false,
"time_in_queue_millis" : 462079,
"time_in_queue" : "7.7m"
},
{
"insert_order" : 179763,
"priority" : "HIGH",
"source" : "shard-failed",
"executing" : false,
"time_in_queue_millis" : 460810,
"time_in_queue" : "7.6m"
},
{
"insert_order" : 179765,
"priority" : "HIGH",
"source" : "shard-failed",
"executing" : false,
"time_in_queue_millis" : 458949,
"time_in_queue" : "7.6m"
},
{
"insert_order" : 179796,
"priority" : "HIGH",
"source" : "shard-failed",
"executing" : false,
"time_in_queue_millis" : 437708,
"time_in_queue" : "7.2m"
},

You have too many shards.

Number of shards needs to be reduced per index? @warkolm

I suspect you need to reduce the number of indices as well as the number of shards. Please read this blog post about shards and sharding practices for some guidelines.

Excuse me for interrupting, I have a similar problem.

http://localhost:9200/_cat/shards

censos2 0 p STARTED 6608080 199.7mb 192.168.0.104 nodo2
censos2 0 r UNASSIGNED
.kibana 0 p STARTED 2 6.1kb 192.168.0.104 nodo2
.kibana 0 r UNASSIGNED
.marvel-es-data-1 0 p STARTED 4 4.5kb 192.168.0.104 nodo2
.marvel-es-data-1 0 r UNASSIGNED
.marvel-es-1-2017.12.21 0 p STARTED 95 147.5kb 192.168.0.104 nodo2
.marvel-es-1-2017.12.21 0 r UNASSIGNED
censos 3 p STARTED 26732520 760.9mb 192.168.0.105 nodo3
censos 3 r STARTED 26732520 760.9mb 192.168.0.104 nodo2
censos 4 p STARTED 26678994 717.3mb 192.168.0.105 nodo3
censos 4 r STARTED 26678994 717.3mb 192.168.0.104 nodo2
censos 1 p STARTED 26722913 688.5mb 192.168.0.105 nodo3
censos 1 r STARTED 26722913 688.5mb 192.168.0.104 nodo2
censos 2 p STARTED 26720830 701.6mb 192.168.0.105 nodo3
censos 2 r STARTED 26720830 701.6mb 192.168.0.104 nodo2
censos 0 p STARTED 26689213 695.7mb 192.168.0.105 nodo3
censos 0 r STARTED 26689213 695.7mb 192.168.0.104 nodo2

http://localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason

censos2 0 p STARTED
censos2 0 r UNASSIGNED CLUSTER_RECOVERED
.kibana 0 p STARTED
.kibana 0 r UNASSIGNED CLUSTER_RECOVERED
.marvel-es-data-1 0 p STARTED
.marvel-es-data-1 0 r UNASSIGNED INDEX_CREATED
.marvel-es-1-2017.12.21 0 p STARTED
.marvel-es-1-2017.12.21 0 r UNASSIGNED INDEX_CREATED
censos 3 p STARTED
censos 3 r STARTED
censos 4 p STARTED
censos 4 r STARTED
censos 1 p STARTED
censos 1 r STARTED
censos 2 p STARTED
censos 2 r STARTED
censos 0 p STARTED
censos 0 r STARTED

Considering that this did not happen before, but it started a couple of weeks ago. Sometimes I solve it, first stopping the allocation of fragments and then restarting the cluster, and after that, I activate the fragment assignment again.

But when every day the marvel indexes are automatically created again, the same thing happens, and it is tedious to have to do the same and the same thing every day, and sometimes I have to do it several times a day for this problem, someone knows the final solution so that this does not happen again?

It'd be better if you created a new thread please :slight_smile:

Lo probaré. Gracias

Reduced the number of indices and the number of shards.

{
"cluster_name": "xxx",
"status": "green",
"timed_out": false,
"number_of_nodes": 35,
"number_of_data_nodes": 32,
"active_primary_shards": 4184,
"active_shards": 8414,
"relocating_shards": 0,
"initializing_shards": 0,
"unassigned_shards": 0,
"delayed_unassigned_shards": 0,
"number_of_pending_tasks": 0,
"number_of_in_flight_fetch": 0,
"task_max_waiting_in_queue_millis": 0,
"active_shards_percent_as_number": 100
}

The cluster is very stable since then.

3 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.