Possible causes of Process Cluster Event Timeout Exception

Hello All,

We have faced a problem in our live environment, which caused index mapping process to fail. Let me tell you exactly when the exception was coming.

We are using 2 node cluster (Elastic 1.4.1, 4 core, 8gb each) in our LIVE env.

Our application creates one index for one virtual event (I know that's too much, but, you know, can't touch legacy code). One day, the event creation process stopped working. After some investigation, we found that something was wrong with Index Creation process. After looking further, we found that index were getting created, but there were some problems with executing _mapping API. According to our logs, response of _mapping API was ProcessClusterEventTimeoutException.
We faced this problem for quite some time, until we chose to remove one node and restart the server. I still don't know, what might have caused it.

I would really love to avoid these kind of problems in future. Can you tell me what are the possible causes of ProcessClusterEventTimeoutException, and how can I debug one if I encounter it next time?

Please help!

1 Like

Hi, did you find out what was the problem by any chance?

We are having similar errors for any cluster level operations - create snapshot repository, delete index, etc. To solve this I had to add "master_timeout" to my REST requests and crank it up from default 30 seconds to 2 minutes.

We are running ES 1.1.0 on Amazon i2.xl's with ~300 indices, ~2200 shards, and 57 nodes.

Any ideas?