Failed to get local cluster state Critical issue

I have 30+ client putting data in 60 data node ES cluster all running on physical machine. Around 15tb data get inserted daily in ES with retention of 2 days. Heavy indexing happens on 3 main indexes which are roll over 4 times a day. Each rollover has 100 shards and 1 replica. Translog set to async. refresh set to 120 seconds.
ES version is 5.2 and we are using Transport Client,.

Often I get this exception in my client log which slows down my indexing drastically so much that client starts paying lot of catch up with current load. This exception occurs frequently.

[Jun 30 21:27:38] INFO  (TransportClientNodesService.java:522) - failed to get local cluster state for {#transport#-25}{pFu-9uiISaG_7UbNT-INPw}{56.241.23.136}{56.241.23.136:9303}, disconnecting...
org.elasticsearch.transport.ReceiveTimeoutTransportException:

This is creating lot of trouble for us. Each of the ES node is running with less than32 GB of heap to enable compressed OOPs. around 200Gbs of memory is available for Filesystem cache.

We have index heavy load and even slight lag leads to lot of problem. Another problem is once this exception comes my client indexing slows down automatically.

Please suggest how we can approach this problem. My application and Physical server reside in same DC. They are connected on 10GB pipe.

Thanks.

This seems to be the same issue as this thread. Did you try any of the suggestions made, e.g. installing monitoring?

Hi christian. Yes i saw that post. Some other team member from our team has posted another problem. This problem is serious as due to this exception the bulk indexing slows down to snail pace. However I think the problem is this root cause.
As far as your suggestion is concered from that post:
There are 12 physical servers hosting 2 ES nodes. they have been given less than 32GB each with over 200 GB left for Filesystem cache.
As per the nature of the application, we have 3 heavy indexing indexes and unfortunately we cannot do in 1. there are different clients. Each client is putting in 1 of the 3 indexes. All 3 indexes are completely different.
Our current rollover cuts shards at roughly 35-40gb per shard.

I was suggesting indexing into a smaller number of shards for each index (typically one primary shard per data node) and increase the average shard size. Also make sure you optimize the mappings as this can save a lot of work and increase speed. This will mean that your indices instead roll over more frequently, which should not be a problem.

Have a look through the logs and see if you see any evidence of slow or frequent GC. If that is the case you may need more heap, which means running multiple nodes per host. Also check to see if you have any log entries that indicate that the cluster state is slow to update or that distributing it times out.

It is good that each client indexes into a single index, as that make bulk requests more targeted.

1 Like

Hi Christian. We will go with 60 shards per index from tomorrow but decreasing shard will lead to low indexing throughput.
Mappings are optimized as per the link. We dont have text fields. We have majority of the fields are keywords. Since we need aggregations so doc values are enabled.
When you say if slow or frequent GC, we may need more heap? what do you mean by this? My 24 data nodes are running on 12 servers with enormous RAM. Servers have 380 GB RAM out of total 64GB is given to 2 ES data nodes (32*2). Rest all is left. Our 14 other data nodes are located 1 each on physical server with almost 200GBs free for file system cache. What is your recommendation here?
Yes we do have log entries that indicate cluster state is slow to update. this gets printed frequently. /cluster/state/monitor.
Yes each client fires into 1 type of index only via bulk processor.
Thanks.

Hello @jiouser

After reading both post, your one and the one from other member of your team, I think that your issue seems to be a GC issue.

You have a lot of data and if most of them are mapped as keyword it would increase the terms space in memory.

Could you check the following api:

GET _nodes/stats/jvm

And check how is going the heap space of your datanodes?
Could you also check how is going the cpu consumption on these nodes?

Hello @Juanma @Christian_Dahlqvist

I checked heap on all data nodes with _cat/nodes .
With 32 GB heap given, most of them are between 60-80% with few between 45-50%. RAM is between 95-100%. I understand Lucene uses entire available memory as FS cache.

CPU is absolutely idle. Max at 10%.

Lucene stored some data off heap, but the file system cache is primarily used when querying and does not necessarily benefit indexing a lot unless you are updating documents. If you have nodes in or about 75% heap usage you are likely to have problems with GC, so I would recommend adding additional data nodes to the hosts in order to increase available heap.

If you had monitoring installed you could quite easily see whether heap is an issue as you ideally should be seeing a saw-tooth pattern. If that is not the case it is possible that you are more or less constantly running GC.

Hi @Christian_Dahlqvist We are in the process of adding additional data nodes. We indeed saw sawtooth pattern on the Data nodes. but still in ES logs we can constantly see GC lines being printed.
Also I observed, this exception starts coming only when I enable sniffing in client. If my client connects only to dedicated coordinator nodes then these exceptions do not appear. Can I reduce the timeout interval from 5seconds to something bigger till the time we resolve the issue or add nodes.?
Also does these exception mean that any data directed towards the node will be missed or get duplicated since node is not responding. I understand there is a dedicated ping channel between node and client and this should not impact the existing bulk indexing happening.

Thanks.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.