I'm running elasticsearch 1.7.5 w/ 19 nodes (12 data nodes).
Attempting to setup snapshots for backup and recovery - but am getting a 503 on creation and deletion of a snapshot repository.
curl -XDELETE 'localhost:9200/_snapshot/backups?pretty'
"error" : "RemoteTransportException[[masternodename][inet[/10.0.0.20:9300]][cluster:admin/repository/delete]]; nested: ProcessClusterEventTimeoutException[failed to process cluster event (delete_repository [backups]) within 30s]; ",
"status" : 503
I was able to adjust the query w/ a master_timeout=10m - still getting a timeout. Is there a way to debug the cause of this request failing?
Do you see a lot of pending tasks on your master node when it happens? If you do, how many tasks are there and what's their most common source?
Originally the cluster had ~10 running tasks (with a higher priority than the put/delete repo). Trying the action again today w/ 0 running tasks - it runs w/ out delay.
I'll monitor - but that may have been the issue.
Yes, I typically see it on large cluster states when cluster state update tasks in front of a delete repo task take too long to finish (because of a large number of nodes, indices, types, aliases etc.). We have improved the situation in later versions of elasticsearch by switching to shipping cluster state diffs instead of a complete cluster state and implementing cluster state task batching in more places.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.