Set timeout

Hi, how can I set timeout in elasticsearch.yml? https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-health.html#cluster-health-api-query-params
I tried these: timeout: 60, query.timeout, Query.timeout - they all result in parse errors during start.

Specifically this is the problem I'm trying to solve:
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (delete-index [[foo_items_production_20190722133720589/lT4Zi0DBTtOXdCs_OxTcYA]]) within 30s

and the cluster is left hanging around. The problem happens in the staging environment and we don't care about fixing the real cause for now.

I think you should try the following
Place this piece of code in the elasticsearch.yml

elasticsearch.shardTimeout: 60000

Nope.

org.elasticsearch.bootstrap.StartupException: java.lang.IllegalArgumentException: unknown setting [elasticsearch.shardTimeout] please check that any required plugins are installed, or check the breaking changes documentation for removed settings

@rihad

so , your elastic search is unable up and run or its up and running but cant able to do this deletion ?

You can try deleting that index from kibana and try restarting the both

The deletions (which we do a few times per day) work fine in 99% of cases, sometimes they can't complete in the allotted time and "leak".

There isn't really a need to increase a timeout here. If you don't care about fixing the underlying issue then you can simply ignore the ProcessClusterEventTimeoutException message. The index deletion will complete eventually.

Thanks. For some reason it does not complete. The deletion is triggered by running curl -X DELETE http://localhost:9200/name_of_index

If that's true then increasing timeouts will not help. We'll need to understand and fix the underlying problem. What version are you using, and what other log messages are emitted around the time of the index deletion? What do the pending tasks API and hot threads API indicate that the cluster is doing?

It happened with 6.8.3, I upgraded it to 6.8.5 yesterday but it didn't help. I've started noticing leaked indices only recently, we've been using ES since March.

Here are the events together with the successful deletion of another index. Just two indexes, one succeeded, one failed:

[2020-01-02T06:06:29,369][INFO ][o.e.c.m.MetaDataDeleteIndexService] [master.example.net] [foo-xyzzy_items_production_20190722133720589/PjW_YrzuTj642q43abYEKQ] deleting index
[2020-01-02T06:08:07,213][DEBUG][o.e.a.a.i.d.TransportDeleteIndexAction] [master.example.net] failed to delete indices [[[foo-bar_items_production_20190722133720589/lT4Zi0DBTtOXdCs_OxTcYA]]]
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (delete-index [[foo-bar_items_production_20190722133720589/lT4Zi0DBTtOXdCs_OxTcYA]]) within 30s
        at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$0(MasterService.java:127) ~[elasticsearch-6.8.3.jar:6.8.3]
        at java.util.ArrayList.forEach(ArrayList.java:1257) ~[?:1.8.0_212]
        at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$1(MasterService.java:126) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) [elasticsearch-6.8.3.jar:6.8.3]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_212]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_212]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
[2020-01-02T06:08:07,248][WARN ][r.suppressed             ] [master.example.net] path: /foo-bar_*, params: {index=foo-bar_*}
org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (delete-index [[foo-bar_items_production_20190722133720589/lT4Zi0DBTtOXdCs_OxTcYA]]) within 30s
        at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$0(MasterService.java:127) ~[elasticsearch-6.8.3.jar:6.8.3]
        at java.util.ArrayList.forEach(ArrayList.java:1257) ~[?:1.8.0_212]
        at org.elasticsearch.cluster.service.MasterService$Batcher.lambda$onTimeout$1(MasterService.java:126) ~[elasticsearch-6.8.3.jar:6.8.3]
        at org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:681) [elasticsearch-6.8.3.jar:6.8.3] 
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_212]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_212]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_212]
[2020-01-02T06:08:21,374][WARN ][o.e.c.s.MasterService    ] [master.example.net] cluster state update task [delete-index [[foo-xyzzy_items_production_20190722133720589/PjW_YrzuTj642q43abYEKQ]]] took [1.8m] above the warn threshold of 30s 
[2020-01-02T06:08:21,374][WARN ][o.e.c.s.ClusterApplierService] [master.example.net] cluster state applier task [apply cluster state (from master [master {master.example.net}{ssdB4PLzQOaPOYEd7oqpFw}{p4jTjJECTDuwhqtrsu8Pww}{10.135.30.66}{10.135.30.66:9300}{xpack.installed=true} committed version [143] source [delete-index [[foo-xyzzy_items_production_20190722133720589/PjW_YrzuTj642q43abYEKQ]]]])] took [1.8m] above the warn threshold of 30s

If I re-run the failed curl command a few hours/days/weeks later it deletes the index eventually.

The thing is, this box is a DigitalOcean VPS and it uses FreeBSD+zfs+dedup. Dedup is horrible with deletions, sometimes locking the machine up for 5-10 seconds. The ES index deletion coincides with other maintenance tasks. This is very likely why the deletes take a long time, but I don't like the Java exception seen there.

Thanks, it's helpful to see a bit more context.

1.8 minutes to delete an index is pretty awful performance and I suspect this will be causing other issues too. There are other reasons that Elasticsearch deletes files apart from index deletion, and it expects these to happen pretty quickly. I don't think there's much value in trying to deduplicate any of Elasticsearch's data. Why not switch deduplication off for this filesystem?

If you really want to continue like this, I think your only options are the timeout and master_timeout parameters of the delete index API, although this won't fix any other cases where deleting a file is too slow.

1 Like

Thanks, I'll try tweaking those params and see if the problem goes away. If not, I will turn off dedup for ES. Unfortunately ZFS only shows dedup statistics at a pool level, not FS level, so I can't tell if there's any merit in that. We use a few initially identical indexes restored from backup, which simply have different names. For Postgres used in this way dedup is a huge win, though.

But how do I set those parameters so that they persist? Can't they be set in elasticsearch.yml?

No, you have to pass them on each request.

1 Like

I meet this problem too,es version 7.4
@DavidTurner

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.