How to handle many indices?

Hello,

I'm using Elasticsearch 5.6.2 which has 16 nodes and more than 800 indices and 20,000 shards.

If I change configuration and restart elasticsearch, the following log continues for a long time(more than 2 hours)
And I can't see kibana due to timeout.

master node log

[2017-10-05T13:36:22,710][WARN ][o.e.c.a.s.ShardStateAction] [somehost] [somelog-2017.10.05][11] received shard failed for shard id [[somelog-2017.10.05][11]], allocation id [r8Z9jnqlTf6PfLzOLEleYQ], primary term [4], message [mark copy as stale]
[2017-10-05T13:36:22,711][WARN ][o.e.c.a.s.ShardStateAction] [somehost] [somelog-2017.10.05][5] received shard failed for shard id [[somelog-2017.10.05][5]], allocation id [z3-KDmd-SMytd547Gj7P_Q], primary term [2], message [mark copy as stale]
[2017-10-05T13:36:22,710][WARN ][o.e.c.a.s.ShardStateAction] [somehost] [somelog-2017.10.05][8] received shard failed for shard id [[somelog-2017.10.05][8]], allocation id [3qeFDJAZTyy0ActIVAKKFA], primary term [2], message [mark copy as stale]
[2017-10-05T13:36:22,841][DEBUG][o.e.a.a.i.m.p.TransportPutMappingAction] [somehost] failed to put mappings on indices [[[somelog4-2017.10.05/QgSiWzbLTxWUiNJTyLjGpw]]], type [fluentd]
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (put-mapping) within 30s
        at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.lambda$null$0(ClusterService.java:255) ~[elasticsearch-5.6.2.jar:5.6.2]
        at java.util.ArrayList.forEach(ArrayList.java:1249) ~[?:1.8.0_131]
        at org.elasticsearch.cluster.service.ClusterService$ClusterServiceTaskBatcher.lambda$onTimeout$1(ClusterService.java:254) ~[elasticsearch-5.6.2.jar:5.6.2]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:569) [elasticsearch-5.6.2.jar:5.6.2]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_131]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_131]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_131]
[2017-10-05T13:36:23,390][WARN ][o.e.c.a.s.ShardStateAction] [somehost] [somelog2-2017.10.05][14] received shard failed for shard id [[somelog2-2017.10.05][14]], allocation id [TiSxoFI_Q56FcbXRtcoOUw], primary term [3], message [mark copy as stale]
[2017-10-05T13:36:23,397][WARN ][o.e.c.a.s.ShardStateAction] [somehost] [somelog2-2017.10.05][2] received shard failed for shard id [[somelog2-2017.10.05][2]], allocation id [z0e9WxrnSiqRP0ccuFW12g], primary term [5], message [mark copy as stale]
[2017-10-05T13:36:23,421][WARN ][o.e.c.a.s.ShardStateAction] [somehost] [somelog3-2017.10.05][13] received shard failed for shard id [[somelog3-2017.10.05][13]], allocation id [NIV27Oe2RpOZSTAA82WdQA], primary term [5], message [mark copy as stale]
[2017-10-05T13:36:23,422][WARN ][o.e.c.a.s.ShardStateAction] [somehost] [somelog3-2017.10.05][9] received shard failed for shard id [[somelog3-2017.10.05][9]], allocation id [ADnldsuiQ6O29CHpGDTI5Q], primary term [2], message [mark copy as stale]
[2017-10-05T13:36:23,424][WARN ][o.e.c.a.s.ShardStateAction] [somehost] [somelog3-2017.10.05][1] received shard failed for shard id [[somelog3-2017.10.05][1]], allocation id [CvUkq-VyR9eOcXreVFynOQ], primary term [3], message [mark copy as stale]
[2017-10-05T13:36:23,424][WARN ][o.e.c.a.s.ShardStateAction] [somehost] [somelog3-2017.10.05][0] received shard failed for shard id [[somelog3-2017.10.05][0]], allocation id [dh5xKeCxToieP2UCsZ7hyw], primary term [4], message [mark copy as stale]

I wait more than 2 hours. cluster state become green and I can see kibana.
But, I have struggled to maintanance elasticsearch due to this problem.

I know elasticsearch's cluster state management is single threaded for simplicity.
But, are there any ideas to reduce maintenance time?

For example, increase cluster.routing.allocation.node_concurrent_recoveries, or so.
I tried to execute put request to increase, but timeout error occurred. so I couldn't try it.

cluster allocation explain api result is as follows
#curl -s -XGET 'localhost:9200/_cluster/allocation/explain' | python -m json.tool

{
"allocate_explanation": "allocation temporarily throttled",
"can_allocate": "throttled",
"current_state": "unassigned",
"index": "...-2017.10.05",
"node_allocation_decisions": [
{
"deciders": [
{
"decider": "throttling",
"decision": "THROTTLE",
"explanation": "reached the limit of incoming shard recoveries [2], cluster setting [cluster.routing.allocation.node_concurrent_incoming_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
}
],
"node_attributes": {
"ml.enabled": "true",
"ml.max_open_jobs": "10"
},
"node_decision": "throttled",
"node_id": "...",
"node_name": "...",
"transport_address": "...:9300"
},
{
"deciders": [
{
"decider": "throttling",
"decision": "THROTTLE",
"explanation": "reached the limit of incoming shard recoveries [2], cluster setting [cluster.routing.allocation.node_concurrent_incoming_recoveries=2] (can also be set via [cluster.routing.allocation.node_concurrent_recoveries])"
}
],
"node_attributes": {
"ml.enabled": "true",
"ml.max_open_jobs": "10"
},
"node_decision": "throttled",
"node_id": "...",
"node_name": "...",
"transport_address": "...:9300"
},
...

How much data do you have in the cluster?

more than 2 billion documents

What is the average shard size?

Avg is about 170MB. but it has skew. Most biggest shard is 200GB

I recommend that you look at this video from @pmusa

That is a very small average shard size, which can be very inefficient. The max shard size is however quite a bit over what we usually recommend, so I would recommend revisiting how you organise your data into shards and indices in order to reduce the number of indices and shards and get a smaller spread in shard size.

Thank you for your comment!
hmm, I will revisit...

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.