Share Rebalancing on large clusters (2.4)


(Ed) #1

Hey guys got a question.

I have a cluster with over 20k total shards (each index is 30 shards + 1 replication and is oh about 300GB each) on 18 Data nodes with ~24 cores on each node. oh and we are indexing 10K message per second all day long (about 1TB a day of data)

  1. When _open a couple of indexes at a time the cluster re balances for a while
  2. when doing maintenance on a node it takes for ever for it to rebalance

When the re-routing/rebalancing/recovering is happening my indexing slows way down.

So here are my questions

  1. I know there are Heuristics on when the cluster chooses to re balance but I don't understand the meaning of the numbers so I am afraid to touch them. Any resources that can help describe this better (or should I look somewhere else)

https://www.elastic.co/guide/en/elasticsearch/reference/master/shards-allocation.html#_shard_balancing_heuristics

  1. I have looked at the Thread Queues but don't see any threads being maxed out during the re-balancing. and I have played with the concurrent load balancing settings at the cluster level. but doing it slow (concurrent rebalancing 2 or three) or fast at +30 seems to have the same impact.

https://www.elastic.co/guide/en/elasticsearch/reference/master/shards-allocation.html#_shard_allocation_settings


(Ed) #2
curl -XPUT $HOSTNAME:9200/_cluster/settings -d '{
"transient" : {
"indices.recovery.max_bytes_per_sec": "5000mb",
"indices.recovery.concurrent_streams": 8,
"cluster.routing.allocation.node_concurrent_recoveries": 8,
"cluster.routing.allocation.node_initial_primaries_recoveries": 4,
"cluster.routing.allocation.cluster_concurrent_rebalance":  8,
"index.unassigned.node_left.delayed_timeout": "1m",
"index.refresh_interval" : "5s",
"cluster.routing.allocation.enable" : "all",
"cluster.routing.allocation.allow_rebalance" : "always"

}
}'

(Mark Walkom) #3

Is the index is 300GB, or the shard?


(Ed) #4

The Index, ranges from 100GB - 300GB depending on the index. ( ~5 different indexes adding up to the 1TB a day)

the share should be be about 10GB for the 300GB index (300GB / 30 shards = 10gb) right?

we are thinking about adding shards because we are adding hosts.

We are ok with slow searches but can't deal with a backlog of indexing during recovery


(Ed) #5

no idea's?

well, I did find one issue as that I was CPU/IO Bound. I added 4 more luns and went from 50Mbps to 500Mbps so that will help once I finish spliting the 15 nodes.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.