How to monitor restore from snapshot - failure case?

I found that /_cat/recovery API gives information about ongoing and completed snapshot recovery activity.

In this response body, there is an attribute called stage whose possible values are { init, index, start, translog, finalize, done } but this doesn't say whether restore is failed or succeed.

I would like to know how to determine the restore is failed.

Thanks for your help in advance.

If the restore fails then the cluster health is reported as red.

Thanks @DavidTurner.

Cluster health can be red for other reasons as well. How could we differentiate that cluster health is red due of restore failure.

It is true, the restore might succeed and then one of the primaries fails for a different reason. Does the distinction matter? Can you explain how you would react differently in the two cases?

You can tell the difference with the cluster allocation explain API.

I have a utility where the user triggers the restore and we provide detailed report about the execution whether this job is on-going, succeed or failed. So, here comes this case.

If we know that restore is not successful then we can understand that issue may be with snapshot or something around the snapshot restore area to be fixed. And we can re-attempt the restore.

If there is any distinction for this it will help.

I'm still not sure I see the need to distinguish the two cases. Either way you will want to use the cluster allocation explain API to describe the problem to the user.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.