Failed to process cluster event Exception

That depends on the shard count. That blog provides a lot of great info.

Potentially. You see to me ok with testing, so look at setting up Rally with different shard count/sizes.

Initial idea was to have one primary and two replicas to handle failover. will go through the documents and will get back to u with the test result.

Is your data important enough and the failure risk high enough to have two replicas though? I don't know your environment and requirements, but I figure I'd ask.

Having two replica's isn't a bad idea, you just need to be aware of the tradeoffs other than just having 50% more data/shards/indices in your cluster :slight_smile:

Yes, Data is very important and failover must be handled. Adding extra node is not a problem. Just need to know how far I can go in terms of scaling wrt index count and aggressive indexing and aggregating.

Hi @warkolm
Coming back to the main problem, Why am I seeing lots of pending task on each node? Let's consider my current setup with no replica

Have a look at hot threads, it might add more.

But it's likely because you have too many shards.

Most of them are shard initializing and indexing. Some say's the reason as NODE_LEFT. But I haven't seen any node getting disconnected. Do you recommend to increase the default timeout of 30s? And what all settings should I look into in scaled deployments like mine?

You should reduce your shard count.

I know I repeat that, but it's important. Each time a shard needs to be moved or have a change in state, if a node leaves or joins, if an index is created or updated, is a mapping is created or updated, the cluster master needs to update the cluster state.

In 5.X cluster states are propagated as deltas, whereas in 2.X we'd apply the change and then push the full state out to each node. So 5.X is much more efficient with this.

But when you have 45000 shards, that means that the node routing/mapping state is huge. Irrespective of applying cluster state changes as just the deltas, iterating through a routing state of that size just to move even a single is a big job.

Changing settings here is only ignoring the problem. Reduce your shard count for this many nodes is the real solution.

2 Likes

To give you more context look at this - https://twitter.com/fdevillamil/status/917044462622330880/photo/1

  • 39899 shards
  • 186 nodes

They also have dedicated master nodes, with relatively powerful infrastructure.

You have 3 nodes and 45000 shards. There's a fair bit of a difference there.

2 Likes

@warkolm Thank you very much. Definalty will look into reducing the number of shards for my use case and look into testing it on 5.x. Will get back to you if I have any quires.

Have a quick question on this. How did we come up with 300-500 shards? Is this number stands well when each shard are running with max capacity (2.1 billion records)?. If I reduce the document count per shard to 1billion can I have 600 to 1000 shards? How to find out how many shards I can have in a node?

The recommendation around number of shards per node in a cluster described in the blog post Mark linked to earlier in this thread is based on what we see in the field across multiple types of deployments. It is meant as a guideline and is not an absolute limit.

What drives this is the memory overhead that each shard/segment comes with, which is not linear. Larger shards/segments typically have a lower proportion of overhead compared to the data volume than smaller shards. Having twice the number of shards with the same amount of documents is therefore likely to result in more overhead.

Exactly how much overhead your cluster can handle depends on how much heap querying and indexing requires, and this will depend a lot on the use case. I know of users running with more shards than what we recommended in the blog post, but also other users that had to set the limit lower.

The only way to accurately find out what the limit is for your use case is to test and benchmark with your data and work load.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.