Snapshot exception

Hi

i'm trying to take snapshot for every 30 min ,
The following problems occur.
ERROR)
{
"type": "concurrent_snapshot_execution_exception",
"reason": "[reponm-prd-snapshots:scheduled-lm_rkid_qtoa75fr0qvlca] a snapshot is already running",
"stack_trace": "ConcurrentSnapshotExecutionException[[reponm-prd-snapshots:scheduled-lm_rkid_qtoa75fr0qvlca] a snapshot is already running]\n\tat org.elasticsearch.snapshots.SnapshotsService$1.execute(SnapshotsService.java:203)\n\tat org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:47)\n\tat org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:702)\n\tat org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:324)\n\tat org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:219)\n\tat org.elasticsearch.cluster.service.MasterService.access$000(MasterService.java:73)\n\tat org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:151)\n\tat org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150)\n\tat org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188)\n\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:636)\n\tat org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252)\n\tat org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)\n\tat java.base/java.lang.Thread.run(Thread.java:832)\n"
}

The spec are the same.
elasticsearch 7.8.0
total node : 3 ea(master/data , master/data, master/data) clustering
{
  "nodes": {
    "sqMr-H3dRd6zqtcVysB-xg": {
      "name": "node-3"
    },
    "EvHTgdmkSF-JnTwhORAV-Q": {
      "name": "node-1"
    },
    "AUi5OleeTiKDGDhtYosJpw": {
      "name": "node-2"
    }
  }
}

I don't know what's wrong.
please help me !!

Welcome.

Please read this about how to format.

Here the error message seems to be clear:

a snapshot is already running

You should probably check before running the new snapshot that the previous one has finished. Or just "ignore" the error message and try again 30 minutes later.

  • Execution by schedule => Error
    The current problem is the same problem even if the time interval is increased.
    And snapshot execution is executed only by schedule.

I don't understand. Could you clarify?

I registered the schedule in snapshot &restore,
I just ran the processor according to the schedule.
But I get the same error as above.

If you want to run the snapshot every 30 minutes, should the cron schedule not look something like this 0,30 * * * * ? Note that if the snapshot does not complete within 30 minutes you are likely to see the same problem with this schedule.

1 Like

Hi @pmk

one thing to add to what @Christian_Dahlqvist points out

Note that if the snapshot does not complete within 30 minutes you are likely to see the same > problem with this schedule.

This is a non-issue if you were to upgrade to v7.9 or later. We support fully concurrent snapshot operations from that version on. See:

1 Like


duration : 170s
The schedule is changed from 30 min every day to once a day
But ERROR message is output

@pmk

But ERROR message is output

That is strange but I think there might be a bug here, we've had similar but never properly reproduced reports before.

Could you paste the logs around the error message maybe, including the part where the SnapshotsService logs the start of the snapshot that actually works out so I can take a look?

Thanks!

1 Like

setting


error message)

        {
          "type": "concurrent_snapshot_execution_exception",
          "reason": "[reponm-prd-snapshots:scheduled-v6zsv9rxrhgazmcdhrwlig]  a snapshot is already running",
          "stack_trace": "ConcurrentSnapshotExecutionException[[reponm-prd-snapshots:scheduled-v6zsv9rxrhgazmcdhrwlig]  a snapshot is already running]\n\tat org.elasticsearch.snapshots.SnapshotsService$1.execute(SnapshotsService.java:203)\n\tat org.elasticsearch.cluster.ClusterStateUpdateTask.execute(ClusterStateUpdateTask.java:47)\n\tat org.elasticsearch.cluster.service.MasterService.executeTasks(MasterService.java:702)\n\tat org.elasticsearch.cluster.service.MasterService.calculateTaskOutputs(MasterService.java:324)\n\tat org.elasticsearch.cluster.service.MasterService.runTasks(MasterService.java:219)\n\tat org.elasticsearch.cluster.service.MasterService.access$000(MasterService.java:73)\n\tat org.elasticsearch.cluster.service.MasterService$Batcher.run(MasterService.java:151)\n\tat org.elasticsearch.cluster.service.TaskBatcher.runIfNotProcessed(TaskBatcher.java:150)\n\tat org.elasticsearch.cluster.service.TaskBatcher$BatchedTask.run(TaskBatcher.java:188)\n\tat org.elasticsearch.common.util.concurrent.ThreadContext$ContextPreservingRunnable.run(ThreadContext.java:636)\n\tat org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.runAndClean(PrioritizedEsThreadPoolExecutor.java:252)\n\tat org.elasticsearch.common.util.concurrent.PrioritizedEsThreadPoolExecutor$TieBreakingPrioritizedRunnable.run(PrioritizedEsThreadPoolExecutor.java:215)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630)\n\tat java.base/java.lang.Thread.run(Thread.java:832)\n"
        }

@pmk sorry I should have worded this more carefully.

What I'm looking for is full logs with timestamps that show both the ERROR for failing to start a snapshot but also the logs for when the concurrent snapshot that prevented the failing one to start happened. So everything between a line like this for the running snapshot:

[2020-10-15T10:06:51,149][INFO ][o.e.s.SnapshotsService   ] [node_s0] snapshot [test-repo:test-snap/88ZwRkUERZClvs2_0w4DQA] started

and it's corresponding completion log which looks like this:

[2020-10-15T10:06:51,574][INFO ][o.e.s.SnapshotsService   ] [node_s0] snapshot [test-repo:test-snap/88ZwRkUERZClvs2_0w4DQA] completed with state [SUCCESS]

would be ideal if possible.

Thanks again!

Due to security policy, files cannot be attached.

Thanks @pmk

we observed this issue in another context as well today and think this is a bug. We're tracking the work on it in the below issue now

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.