[ElasticSearch 2.2.0] I am occasionally getting Process Cluster Event Timeout Exception[failed to process cluster event (put-mapping [as]) within 30s] while bulk indexing documents

Hello, ES users.

I am occasionally getting ProcessClusterEventTimeoutException[failed to process cluster event (put-mapping [as]) within 30s] while doing bulk indexing.

failed to execute bulk item (index) index {[uh-as-440-20150720][as][a1092e5ad6b925eb7c262b748695c0eb42e2342e::BCQzRjvdQpKtARfKTuGpvw==]
...
...
ProcessClusterEventTimeoutException[failed to process cluster event (put-mapping [as]) within 30s]
at org.elasticsearch.cluster.service.InternalClusterService$2$1.run(InternalClusterService.java:343)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Cluster runs without any problem for a while, but then time to time it throws such exception.
I am not quite sure what cause it.
The throughput to the ES cluster is steady.
(Currently, it's quiet. It wasn't while I was in sleep.)

Q. What should I check to fix this issue?
Q. Is there a way to increase the timeout value?

Thank you. :smiley:

What version, how much data are you indexing and is in the cluster?

It's ElasticSearch v.2.2.0

{AWS EC2 m4.xlarge : 4 vCPUs, 16 GiB Mem} x 3

node-1: master/data
node-2: data-only
node-3: data-only

ES_HEAP_SIZE: 10g

Data: 100,000,000 docs
Indices: 13000
Shards: 3 / index
Replica: 1

After a while, master throws the following OOM :sob:

[2016-02-21 22:03:47,104][WARN ][monitor.jvm              ] [Hulk] [gc][old][18835][175] duration [3m], collections [5]/[3m], total [3m]/[1h], memory [9.9gb]->[9.9gb]/[9.9gb], all_pools {[young] [266.2mb]->[266.2mb]/[266.2mb]}{[survivor] [33.2mb]->[33.2mb]/[33.2mb]}{[old] [9.6gb]->[9.6gb]/[9.6gb]}
[2016-02-21 22:31:27,605][WARN ][transport.netty          ] [Hulk] exception caught on transport layer [[id: 0x180573ba, /172.31.7.130:40806 => /172.31.9.135:9300]], closing connection
java.lang.OutOfMemoryError: Java heap space
        at org.apache.lucene.util.CharsRefBuilder.<init>(CharsRefBuilder.java:35)
        at org.elasticsearch.common.io.stream.StreamInput.<init>(StreamInput.java:246) 

Is the memory too low?
What's happening inside the master's memory. Data nodes seem working okay.

Shards are not free and carries a certain amount of overhead with respect to memory and file handles. With that many indices, the cluster state is also likely to be quite large and use up a fair amount of memory.

Having 78000 (if I count correctly) shards is way, way too many for a cluster of that size and specification, and will use up a lot of memory. I recommend you rethink your indexing/sharding strategy in order to dramatically reduce the number of shards in the cluster.

It seems this is a duplicate of this issue: What happends if I change master: true, data: true node to master-only node?

Would this be the same reason why the cluster throws ProcessClusterEventTimeoutException[failed to process cluster event (put-mapping [as]) within 30s]?

I would expect this to cause a range of cluster issues.

Let's continue the discussion here - What happends if I change master: true, data: true node to master-only node?

1 Like