i'm trying to take snapshot for every 30 min ,
The following problems occur.
ERROR)
{
"type": "concurrent_snapshot_execution_exception",
"reason": "[reponm-prd-snapshots:scheduled-lm_rkid_qtoa75fr0qvlca] a snapshot is already running",
"stack_trace": "ConcurrentSnapshotExecutionException[[reponm-prd-snapshots:scheduled-lm_rkid_qtoa75fr0qvlca] a snapshot is already running]\n\tat org.elasticsearch.snapshots.SnapshotsService$1.execute(SnapshotsService.java:203)\n\tat org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:47)\n\tat org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:702)\n\tat org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:324)\n\tat org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:219)\n\tat org.elasticsearch.cluster.service.MasterService.access$000(MasterService.java:73)\n\tat org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:151)\n\tat org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150)\n\tat org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188)\n\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:636)\n\tat org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252)\n\tat org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)\n\tat java.base/java.lang.Thread.run(Thread.java:832)\n"
}
The spec are the same.
elasticsearch 7.8.0
total node : 3 ea(master/data , master/data, master/data) clustering
{
"nodes": {
"sqMr-H3dRd6zqtcVysB-xg": {
"name": "node-3"
},
"EvHTgdmkSF-JnTwhORAV-Q": {
"name": "node-1"
},
"AUi5OleeTiKDGDhtYosJpw": {
"name": "node-2"
}
}
}
You should probably check before running the new snapshot that the previous one has finished. Or just "ignore" the error message and try again 30 minutes later.
Execution by schedule => Error
The current problem is the same problem even if the time interval is increased.
And snapshot execution is executed only by schedule.
If you want to run the snapshot every 30 minutes, should the cron schedule not look something like this 0,30 * * * * ? Note that if the snapshot does not complete within 30 minutes you are likely to see the same problem with this schedule.
That is strange but I think there might be a bug here, we've had similar but never properly reproduced reports before.
Could you paste the logs around the error message maybe, including the part where the SnapshotsService logs the start of the snapshot that actually works out so I can take a look?
@pmk sorry I should have worded this more carefully.
What I'm looking for is full logs with timestamps that show both the ERROR for failing to start a snapshot but also the logs for when the concurrent snapshot that prevented the failing one to start happened. So everything between a line like this for the running snapshot:
[2020-10-15T10:06:51,149][INFO ][o.e.s.SnapshotsService ] [node_s0] snapshot [test-repo:test-snap/88ZwRkUERZClvs2_0w4DQA] started
and it's corresponding completion log which looks like this:
[2020-10-15T10:06:51,574][INFO ][o.e.s.SnapshotsService ] [node_s0] snapshot [test-repo:test-snap/88ZwRkUERZClvs2_0w4DQA] completed with state [SUCCESS]
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.