I don't see anything suspicious in the elasticsearch logs but do see tasks queueing up so it looks like it's working. However, I've not yet seen an index get completely merged down to a single segment per shard.
Note: Curator is running from a docker container in kubernetes.
5XX errors are server side, while Curator is a client side process. A 4XX error would indicate Curator made a bad call. A 504 error indicates that there is a proxy, load balancer, or other gateway between Curator and your Elasticsearch node. No matter what you set your timeout_override to, it's longer than the timeout the gateway (whatever type it may be) allows. More complete debug logging would show how long the client was connected, so you would be able to see this. It will be a nearly perfect amount of seconds, like 60, 120, or 300, usually.
This isn't something Curator can compensate for, unfortunately. Forcemerge doesn't record a _task in the Tasks API, or set a lock in the cluster state or anything like that. A forcemerge sets an invisible block in the cluster state that prevents any other forcemerges from running while another is in progress. This is an opaque process, unfortunately, so Curator simply cannot see what's going on enough to reconnect and resume after a 504 disconnect.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.