Curator: Error with forcemerge

dawiro · June 19, 2018, 11:06am

Hi,
I'm seeing the following error with running forcemerge from curator:

{
	"@timestamp": "2018-06-19T07:14:14.643Z",
	"function": "run",
	"linenum": 184,
	"loglevel": "ERROR",
	"message": "Failed to complete action: forcemerge.  <class 'curator.exceptions.FailedExecution'>: Exception encountered.  Rerun with loglevel DEBUG and/or check Elasticsearch logs for more information. Exception: TransportError(504, u\"<html><body><h1>504 Gateway Time-out</h1>\\nThe server didn't respond in time.\\n</body></html>\\n\")",
	"name": "curator.cli"
}

Action config looks like this:

actions:
1:
action: forcemerge
description: "Index forcemerge test"
options:
max_num_segments: 1
timeout_override: 43200
ignore_empty_list: true
delay: 300
continue_if_exception: false
disable_action: false
filters:
- filtertype: age
source: name
direction: older
timestring: '%Y.%m.%d'
unit: days
unit_count: 1
- filtertype: forcemerged
max_num_segments: 1
exclude: true

I don't see anything suspicious in the elasticsearch logs but do see tasks queueing up so it looks like it's working. However, I've not yet seen an index get completely merged down to a single segment per shard.

Note: Curator is running from a docker container in kubernetes.

theuntergeek · June 19, 2018, 12:28pm

5XX errors are server side, while Curator is a client side process. A 4XX error would indicate Curator made a bad call. A 504 error indicates that there is a proxy, load balancer, or other gateway between Curator and your Elasticsearch node. No matter what you set your timeout_override to, it's longer than the timeout the gateway (whatever type it may be) allows. More complete debug logging would show how long the client was connected, so you would be able to see this. It will be a nearly perfect amount of seconds, like 60, 120, or 300, usually.

This isn't something Curator can compensate for, unfortunately. Forcemerge doesn't record a _task in the Tasks API, or set a lock in the cluster state or anything like that. A forcemerge sets an invisible block in the cluster state that prevents any other forcemerges from running while another is in progress. This is an opaque process, unfortunately, so Curator simply cannot see what's going on enough to reconnect and resume after a 504 disconnect.

dawiro · June 19, 2018, 12:30pm

Thanks, I'll have a look into how out load-balancers are set up

system · July 17, 2018, 12:30pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Curator forcemerge Exception Elasticsearch curator	3	152	March 26, 2024
Curator: Too many Forcemerge Tasks Elasticsearch	6	485	July 9, 2018
Curator error when shrinking Elasticsearch	12	1544	June 7, 2019
Curator throwing error while restoring (Failed to complete action- ) Elasticsearch	13	2741	July 12, 2017
Curator Snapshot Error with Closed indices Elasticsearch	7	2020	June 29, 2018

Curator: Error with forcemerge

Action config looks like this:

Related topics