Cluster Timeout 30s ELK2.4

Looking for some suggestions on how to figure out what is bottle necking out cluster.

I have 18 Servers ( >128gb ,and 20 Cores) running 22 Nodes (3 clients, 5 masters, 14 Data Nodes) (JVMs are all at 30GB Heap)

~340 Indexes with ~20K Primary Shards ( Replica 1 so total 40K shards) for a total of 15TB of space (EMC storage ) Indexing about 1.2M events a minute all day long

So if everything is fine I have not issue, but if I do any kind of maintenance , snapshots, rebalancing , restarts etc I get a lot of 503 Cluster Timeouts or Command Timeouts.

What I don't get is that I have very little CPU utilization, and load is about 2.7 and there only about 500Mbs Disk IO on the EMC which is capable of doing a lot more (per benchmarks we have done >4gbs write)

So, I know there are Cat commands for "Hot Threads" and "Pending Tasks" and "tasks" but I have no clue on how to read them, nor what to look for that would be askew.

I have:

  • Tuned my Index settings to reduce shards on small indexes so that will reduce some of the sharding
  • We are planing a 5.2 upgrade but that is probably still 1 to 2 months away
  • I have tuned the Networking and Disk I/O based on RHEL best practices.

I am really baffled what the bottleneck is , any in-site on what exactly to look for or links would be really helpful.

I'd suggest that the huge number of shards is the main issue.
That's an average of nearly 34K per node!

Right but what governs too many. I mean I have very low cpu utilization, Load average is low and IO is also low.

Generally anything over 4-500 per node is too many.

But still there has to be some kind of metric to show the issue occurring obviously with good hardware you can exceed that number. as before I exceeded 200k level it seemed to be totally stable

I am sure I can double the number of servers and/or split the cluster into multiple clusters. As the amount of memory can easily support it, I'll just adjust the number of processors available in Elastic. That was the plan when I upgrade to 5.2 in the next month or so

but still I would like to know what metric in the Master or in the Elastic Node would show me improvement or problems. I am using new relic, I could even watch the performance of a certain class if there is no reported metric.

It's a few things;

  • Cluster state updates and subsequent transmission - this will be super heavy
  • Routing calculations for queries + indexing

Ok is there a good way to see metric's on these? I mean if there is a maximum number of shards per node, there should be some kind of actual metric to help diagnose this issue? I could keep guessing but it would be nice to have something I can track over time.

Sounds like this might be in the /tasks or pending_tasks? and what kind of task value I would be looking for?

Pending tasks will help for sure, but there's no specific metric for this.

Ok, I will see about reducing the shards per node but it sort of does not make sense that CPU does reflect this.
I guess there is some synchronized method that bottlenecks. Oh well, guess that helps.

I was reading some articles about someone running 4 nodes on on server with the same hardware build. I guess when I go 5.2 I will play with the number of nodes per host and see if I can add more hardware.

Cluster state updates are single threaded to ensure consistency.

Isn't the cluster state would be handled by the master? Why/how would there be a preference of 500 shards per node if it is the master that is possible maxing out. I mean I totally accept we are running to many shards per node, but updating the cluster state would be the same if I had 10 nodes or 300 nodes it still would be 300k shards. The amount of Cluster state updates would be the same. At least from what I hear you saying

Also wouldn't the CPU be high on the Active master node? Even Single threaded method would still utilize 100% of a single core but I don't believe I am seeing that. I will review.

plus 500 shards per node seems odd, since in the Documents they list that by default the Max shards per node is unlimited by default. I would think if there was a Known limit, the default would be set to something reasonable so people don't exceed this amount without a little work and understand this issue.

https://www.elastic.co/guide/en/elasticsearch/reference/current/allocation-total-shards.html

Is this just an over-site in the default values and/or in the documentation? Cause it is a little confusing when they say these numbers are unlimited and experts say it is best to have 500 shards per node.

*note, I still have to track pending tasks so I have to get back to that.

Changes are yes, all nodes have to apply them though. And that's all single threaded.

A shard with 1 doc takes the same amount of base heap to maintain as a shard with 2 billion docs.
Plus every shard needs to be in the routing table, part of cluster state.
A query that needs to hit a shard that isn't on the node the query starts at needs to figure out where the shard it needs to query lives. Which uses the routing table.

All this adds up.

as a Follow-up

Ok, I have found some Curator problems where it was not removing old indexes and so I had a build up of QA logs only being deleted by the 30 rule rather then the 5 day rule. That cleaned up +100 indexes and 6000 total shards or more.

I also adjusted all the mappings to be more correct Down from 60 down to 30 for the +400GB indexes and 10 or less for the smaller ones.

Also I have spawned 14 additional data nodes

  • Originally 14 data nodes on 40CPU/96GB mem systems
  • Spawned a additional server on each Hardware so now it is 28 Data nodes
  • Set the os.process to 8 on each of them (will probably look to add a 3 node)
    *Set Host name , and Host Awarenew to avoid the replicated node winding up on the same host
  • also reduced the number of log-stash instances from 9 to 6
  • Reduced my master cound from 5 to 3 after convo at the ElasticMeetup

Everything seems a lot more stable. thanks for the conversation on it.

Now back to 5.2 upgrade

2 Likes

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.