Timeout exception with many time-based indices after 00:00

ogibayashi · October 19, 2016, 1:45am

Hello,

I am putting various kind of logs into Elasticsearch with daily time-based index (somedata-YYYY.MM.DD).
Recently, I have started to put many other kind of logs into ES, then ES started to logs many ProcessClusterEventTimeoutException after 00:00 AM.

[2016-10-19 00:00:30,858][DEBUG][action.admin.indices.mapping.put] [myhostname] failed to put mappings on indices [[somedata-2016.10.18]], type [fluentd]

ProcessClusterEventTimeoutException[failed to process cluster event (put-mapping [fluentd]) within 30s]
at org.elasticsearch.cluster.service.InternalClusterService$2$1.run(InternalClusterService.java:349)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

Actually, I have more than 200 indices every day, so when the date changes, almost all of them will need new index for the next day.

I suspected that some node (maybe master node) was under high CPU load, but as show below, CPU usage is under 5% in all nodes(servers), and CPU usage is dropped during this time period (00:00-00:10 AM)

It seems like ES is not under high load but something is blocking the operation. Does anyone suggest how can I investigate this?

By the way, I am running 10 nodes Elasticsearch cluster with version 2.4.0.

Regards,
Hironori

nik9000 · October 19, 2016, 2:03am

Elasticsearch's cluster state management is single threaded for simplicity so I wouldn't expect to see the load average really spike.

Are you using dynamic mappings? Those can cause lots of extra cluster state changes as new properties are dynamically added. It is usually much quicker to set up the mapping before hand either by creating the index before it is needed with the mapping you want or by setting up templates. Creating the indexes before they are needed is a fairly nice thing to do because you can stagger them or just set the timeout to some super high number.

Christian_Dahlqvist · October 19, 2016, 4:36am

200 daily indices sounds like a lot. What is the rationale behind having so many? How many shards does that result in on a daily basis? What is the average shard size? How long do you keep your data in the cluster?

ogibayashi · October 20, 2016, 1:22am

Thanks for the suggestion.

Ah, I didn't know that the state management is single threaded. It explains our situation.
Yes, I am using dynamic mapping. I think I should try creating necessary indices beforehand.

ogibayashi · October 20, 2016, 1:32am

The reason why we have 200 daily indices is that we are running log collection and analytics platform, which collect various kind of logs from many applications.

Number of shard is 10 per index, resulting in 2000 shards per day. Average shard size is 127MB. We keep the data only 2 days. (Because we use Hadoop HDFS for long-term storage)

nik9000 · October 20, 2016, 4:41am

Those amount to very small shards. It'd probably make sense to combine a
few of them.

Topic		Replies	Views
Process Cluster Event Timeout Exception Elasticsearch	5	2820	July 6, 2017
Process Cluster Event Timeout Exception on put-mapping Elasticsearch	12	10273	May 31, 2018
[ElasticSearch 2.2.0] I am occasionally getting Process Cluster Event Timeout Exception[failed to process cluster event (put-mapping [as]) within 30s] while bulk indexing documents Elasticsearch	8	13185	February 22, 2016
Failed to put mappings on indices timeouts Elasticsearch	1	1867	July 5, 2017
Getting process cluster event timeout exceptions while bulk indexing with error message failure to put mappings on indices Elasticsearch	4	3138	June 20, 2017

Timeout exception with many time-based indices after 00:00

Related topics