Hey guys got a question.
I have a cluster with over 20k total shards (each index is 30 shards + 1 replication and is oh about 300GB each) on 18 Data nodes with ~24 cores on each node. oh and we are indexing 10K message per second all day long (about 1TB a day of data)
- When _open a couple of indexes at a time the cluster re balances for a while
- when doing maintenance on a node it takes for ever for it to rebalance
When the re-routing/rebalancing/recovering is happening my indexing slows way down.
So here are my questions
- I know there are Heuristics on when the cluster chooses to re balance but I don't understand the meaning of the numbers so I am afraid to touch them. Any resources that can help describe this better (or should I look somewhere else)
- I have looked at the Thread Queues but don't see any threads being maxed out during the re-balancing. and I have played with the concurrent load balancing settings at the cluster level. but doing it slow (concurrent rebalancing 2 or three) or fast at +30 seems to have the same impact.