Process_cluster_event_timeout_exception error on put-mapping in test environment

diegomarcet · August 5, 2022, 6:27pm

Hello,

I'm using the docker.elastic.co/elasticsearch/elasticsearch:7.17.5 docker image for running our tests in the CI server on a single node configuration.
We're frequently getting the following error while running the tests:

Elasticsearch::Transport::Transport::Errors::ServiceUnavailable:
       [503] {"error":{"root_cause":[{"type":"process_cluster_event_timeout_exception","reason":"failed to process cluster event (put-mapping [local_chapter_organiser_requests10/dzf0K-4nRlGVnpO7bHSvlQ]) within 30s"}],"type":"process_cluster_event_timeout_exception","reason":"failed to process cluster event (put-mapping [local_chapter_organiser_requests10/dzf0K-4nRlGVnpO7bHSvlQ]) within 30s"},"status":503}

We run on a ruby stack so we're using the elasticsearch ruby gems.
We've tried to set the master_timeout and timeout parameters while creating the indexes as described in Update mapping API | Elasticsearch Guide [8.3] | Elastic, but this haven't had any effect on the issue. It's definitely possible that we did this incorrectly as the ruby gem's docs aren't super clear on how to pass these params, but before digging into that, I wanted to confirm if setting any of these to a longer timeout would fix the timeout issues.

Thanks!
Diego.

Christian_Dahlqvist · August 5, 2022, 6:36pm

That should never take that long so something is likely wrong, and I do therefore not think increasing the timeout will help. How much memory and heap does the node have available? Is there anything else running on the node? How much CPU is allocated? Does the cluster hold a lot of data or a large number of indices and shards? What type of storage are you using?

diegomarcet · August 5, 2022, 6:51pm

Hi Christian, thanks for replying.

We're limiting the heap size to 512mb by passing the ES_JAVA_OPTS=-Xms512m -Xmx512m environmental variable to the container. The node shouldn't have much data, we delete the indices after each test runs to ensure they're independent. We do run our tests in parallel processes using a single elasticsearch node, so we use suffixes at the end of the index names to ensure there's no clashing between processes.
If you think it's a resource constraint issue I can give it a try allocating more resources for the ES node.

Christian_Dahlqvist · August 5, 2022, 7:32pm

If you do not have a large cluster state due to large number of shards I would look for evidence of long GC in the logs or high iowait at the storage level. CPU allication could possibly also play a part if very low.

diegomarcet · August 10, 2022, 2:14pm

We spent some more time on this. After instrumenting our servers to get more fine-grainde metrics, and also decreasing the number of parallel tests we run we're still seeing the issue while CPU and I/O both look fine.

One thing I couldn't find guidance anywhere is around how to run integration tests with Elasticsearch. We currently create indexes for all types of documents before each test run, and delete those indexes after the test finishes. We've also tried deleting all indexed documents after each run without much success either.
Do you have any recommendations on how to run ES for a short time and for only a few documents at a time? Is there a way to run in-memory to avoid I/O waits?

Christian_Dahlqvist · August 10, 2022, 2:18pm

If you are making a lot of changes in parallel that affect the cluster state in a short amount of time I guess these could be queued up and take a while to process as the cluster state AFAIK is updated and propagated in a single thread for consistency.

system · September 7, 2022, 2:19pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Index failed to process cluster event (put-mapping) within 30s Elasticsearch	4	3089	December 15, 2017
Getting exception Process ClusterEvent Timeout Exception after 5 minutes Elasticsearch	3	400	October 14, 2019
503 PUT mapping exceptions with large number of mappings Elasticsearch	3	2579	July 5, 2017
[ElasticSearch 2.2.0] I am occasionally getting Process Cluster Event Timeout Exception[failed to process cluster event (put-mapping [as]) within 30s] while bulk indexing documents Elasticsearch	8	13191	February 22, 2016
Getting process cluster event timeout exceptions while bulk indexing with error message failure to put mappings on indices Elasticsearch	4	3141	June 20, 2017

Process_cluster_event_timeout_exception error on put-mapping in test environment

Related topics