Failure to backup ElasticSearch Using Velero

Hi Folks!
We are running Elasticsearch cluster on our EKS cluster and using Velero to backup Elasticsearch.
Elasticsearch version: 7.16.2
Number of ES Nodes: 3
We noticed that Velero partially fails to fetch the index from this path: nodes/0/indices/xxxxxx
The particular index is not present under indices inside the node either. Not sure whether it's getting deleted somehow?.
We did not see any unusual messages indicating that the index is being deleted.
Would be great if you could assist us to fix this.

Here are the logs for ES:

May 17, 2022 @ 06:00:01.609	[2022-05-17T00:30:01.218+0000][8][gc,age      ] GC(194) - age  14:        208 bytes,    6134736 total
	atlas-elasticsearch-master-2

May 17, 2022 @ 06:00:01.609	[2022-05-17T00:30:01.218+0000][8][gc,age      ] GC(194) - age  15:     355608 bytes,    6490344 total
	atlas-elasticsearch-master-2

May 17, 2022 @ 06:00:01.609	[2022-05-17T00:30:01.218+0000][8][gc,phases   ] GC(194)   Pre Evacuate Collection Set: 0.1ms
	atlas-elasticsearch-master-2

May 17, 2022 @ 06:00:01.609	[2022-05-17T00:30:01.218+0000][8][gc,phases   ] GC(194)   Merge Heap Roots: 0.1ms
	atlas-elasticsearch-master-2

May 17, 2022 @ 06:00:01.609	[2022-05-17T00:30:01.218+0000][8][gc,phases   ] GC(194)   Evacuate Collection Set: 13.0ms
	atlas-elasticsearch-master-2

May 17, 2022 @ 06:00:01.609	[2022-05-17T00:30:01.218+0000][8][gc,phases   ] GC(194)   Post Evacuate Collection Set: 3.2ms
	atlas-elasticsearch-master-2

May 17, 2022 @ 06:00:01.609	[2022-05-17T00:30:01.218+0000][8][gc,phases   ] GC(194)   Other: 0.3ms
	atlas-elasticsearch-master-2

May 17, 2022 @ 06:00:01.609	[2022-05-17T00:30:01.218+0000][8][gc,heap     ] GC(194) Eden regions: 612->0(612)
	atlas-elasticsearch-master-2

May 17, 2022 @ 06:00:01.609	[2022-05-17T00:30:01.218+0000][8][gc,heap     ] GC(194) Survivor regions: 2->2(77)
	atlas-elasticsearch-master-2

May 17, 2022 @ 06:00:01.609	[2022-05-17T00:30:01.218+0000][8][gc,heap     ] GC(194) Old regions: 111->111
	atlas-elasticsearch-master-2

May 17, 2022 @ 06:00:01.609	[2022-05-17T00:30:01.218+0000][8][gc,heap     ] GC(194) Archive regions: 2->2
	atlas-elasticsearch-master-2

May 17, 2022 @ 06:00:01.609	[2022-05-17T00:30:01.218+0000][8][gc,heap     ] GC(194) Humongous regions: 16->16
	atlas-elasticsearch-master-2

May 17, 2022 @ 06:00:01.609	[2022-05-17T00:30:01.218+0000][8][gc,metaspace] GC(194) Metaspace: 133600K(135360K)->133600K(135360K) NonClass: 116536K(117568K)->116536K(117568K) Class: 17064K(17792K)->17064K(17792K)
	atlas-elasticsearch-master-2

May 17, 2022 @ 06:00:01.609	[2022-05-17T00:30:01.218+0000][8][gc          ] GC(194) Pause Young (Normal) (G1 Evacuation Pause) 2966M->517M(4096M) 16.751ms
	atlas-elasticsearch-master-2

May 17, 2022 @ 06:00:01.609	[2022-05-17T00:30:01.218+0000][8][gc,cpu      ] GC(194) User=0.02s Sys=0.00s Real=0.02s
	atlas-elasticsearch-master-2

May 17, 2022 @ 06:00:01.609	[2022-05-17T00:30:01.218+0000][8][safepoint   ] Safepoint "G1CollectForAllocation", Time since last: 1380811914521 ns, Reaching safepoint: 110232 ns, At safepoint: 16832914 ns, Total: 16943146 ns
	atlas-elasticsearch-master-2

May 17, 2022 @ 05:50:02.385	[2022-05-17T00:20:01.957+0000][8][safepoint   ] Safepoint "Cleanup", Time since last: 20007202411 ns, Reaching safepoint: 97521 ns, At safepoint: 5320 ns, Total: 102841 ns
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:49:42.382	[2022-05-17T00:19:41.950+0000][8][safepoint   ] Safepoint "Cleanup", Time since last: 59010060600 ns, Reaching safepoint: 95512 ns, At safepoint: 26230 ns, Total: 121742 ns
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:48:43.373	[2022-05-17T00:18:42.940+0000][8][safepoint   ] Safepoint "Cleanup", Time since last: 94417368667 ns, Reaching safepoint: 140122 ns, At safepoint: 6150 ns, Total: 146272 ns
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.492+0000][8][gc,start    ] GC(35) Pause Young (Normal) (G1 Evacuation Pause)
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.492+0000][8][gc,task     ] GC(35) Using 1 workers of 1 for evacuation
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.492+0000][8][gc,age      ] GC(35) Desired survivor size 161480704 bytes, new threshold 15 (max threshold 15)
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][gc,age      ] GC(35) Age table with threshold 15 (max threshold 15)
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][gc,age      ] GC(35) - age   1:    2096264 bytes,    2096264 total
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][gc,age      ] GC(35) - age   2:      27904 bytes,    2124168 total
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][gc,age      ] GC(35) - age   3:     351080 bytes,    2475248 total
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][gc,age      ] GC(35) - age   4:    1147344 bytes,    3622592 total
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][gc,age      ] GC(35) - age   5:     408704 bytes,    4031296 total
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][gc,age      ] GC(35) - age   6:      31048 bytes,    4062344 total
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][gc,age      ] GC(35) - age   7:      54296 bytes,    4116640 total
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][gc,age      ] GC(35) - age   8:      35712 bytes,    4152352 total
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][gc,age      ] GC(35) - age   9:     116504 bytes,    4268856 total
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][gc,age      ] GC(35) - age  10:      52728 bytes,    4321584 total
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][gc,age      ] GC(35) - age  11:      37464 bytes,    4359048 total
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][gc,age      ] GC(35) - age  12:      15968 bytes,    4375016 total
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][gc,age      ] GC(35) - age  13:    1706704 bytes,    6081720 total
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][gc,age      ] GC(35) - age  14:     341984 bytes,    6423704 total
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][gc,age      ] GC(35) - age  15:    1218104 bytes,    7641808 total
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][gc,phases   ] GC(35)   Pre Evacuate Collection Set: 0.2ms
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][gc,phases   ] GC(35)   Merge Heap Roots: 0.4ms
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][gc,phases   ] GC(35)   Evacuate Collection Set: 26.3ms
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][gc,phases   ] GC(35)   Post Evacuate Collection Set: 2.4ms
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][gc,phases   ] GC(35)   Other: 0.4ms
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][gc,heap     ] GC(35) Eden regions: 611->0(612)
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][gc,heap     ] GC(35) Survivor regions: 3->2(77)
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][gc,heap     ] GC(35) Old regions: 21->21
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][gc,heap     ] GC(35) Archive regions: 2->2
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][gc,heap     ] GC(35) Humongous regions: 2->2
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][gc,metaspace] GC(35) Metaspace: 117229K(118720K)->117229K(118720K) NonClass: 101928K(102784K)->101928K(102784K) Class: 15301K(15936K)->15301K(15936K)
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][gc          ] GC(35) Pause Young (Normal) (G1 Evacuation Pause) 2549M->103M(4096M) 29.855ms
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][gc,cpu      ] GC(35) User=0.03s Sys=0.01s Real=0.04s
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:47:09.353	[2022-05-17T00:17:08.522+0000][8][safepoint   ] Safepoint "G1CollectForAllocation", Time since last: 607641133964 ns, Reaching safepoint: 129421 ns, At safepoint: 29983831 ns, Total: 30113252 ns
	atlas-elasticsearch-master-1


May 17, 2022 @ 05:37:01.225	[2022-05-17T00:07:00.851+0000][8][safepoint   ] Safepoint "Cleanup", Time since last: 2000296657 ns, Reaching safepoint: 98512 ns, At safepoint: 4290 ns, Total: 102802 ns
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:37:00.435	[2022-05-17T00:07:00.389+0000][8][safepoint   ] Safepoint "Cleanup", Time since last: 117013643612 ns, Reaching safepoint: 97211 ns, At safepoint: 27141 ns, Total: 124352 ns
	atlas-elasticsearch-master-2

May 17, 2022 @ 05:36:59.225	[2022-05-17T00:06:58.851+0000][8][safepoint   ] Safepoint "Cleanup", Time since last: 303058536926 ns, Reaching safepoint: 144412 ns, At safepoint: 6630 ns, Total: 151042 ns
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:35:03.421	[2022-05-17T00:05:03.375+0000][8][safepoint   ] Safepoint "Cleanup", Time since last: 2701328437869 ns, Reaching safepoint: 129033 ns, At safepoint: 7580 ns, Total: 136613 ns
	atlas-elasticsearch-master-2

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.926+0000][8][gc,start    ] GC(117) Pause Young (Normal) (G1 Evacuation Pause)
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.926+0000][8][gc,task     ] GC(117) Using 1 workers of 1 for evacuation
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.926+0000][8][gc,age      ] GC(117) Desired survivor size 161480704 bytes, new threshold 15 (max threshold 15)
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.941+0000][8][gc,age      ] GC(117) Age table with threshold 15 (max threshold 15)
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.941+0000][8][gc,age      ] GC(117) - age   1:    1215920 bytes,    1215920 total
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.941+0000][8][gc,age      ] GC(117) - age   2:      19088 bytes,    1235008 total
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.941+0000][8][gc,age      ] GC(117) - age   3:     920648 bytes,    2155656 total
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.941+0000][8][gc,age      ] GC(117) - age   4:     515320 bytes,    2670976 total
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.941+0000][8][gc,age      ] GC(117) - age   5:     336976 bytes,    3007952 total
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.941+0000][8][gc,age      ] GC(117) - age   6:    1181872 bytes,    4189824 total
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.941+0000][8][gc,age      ] GC(117) - age   7:        152 bytes,    4189976 total
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.941+0000][8][gc,age      ] GC(117) - age   8:     355048 bytes,    4545024 total
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.941+0000][8][gc,age      ] GC(117) - age   9:    1173904 bytes,    5718928 total
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.941+0000][8][gc,age      ] GC(117) - age  10:     150400 bytes,    5869328 total
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.941+0000][8][gc,age      ] GC(117) - age  11:     577368 bytes,    6446696 total
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.941+0000][8][gc,age      ] GC(117) - age  12:     573384 bytes,    7020080 total
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.941+0000][8][gc,age      ] GC(117) - age  13:       1072 bytes,    7021152 total
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.942+0000][8][gc,age      ] GC(117) - age  15:       1912 bytes,    7023064 total
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.942+0000][8][gc,phases   ] GC(117)   Pre Evacuate Collection Set: 0.2ms
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.942+0000][8][gc,phases   ] GC(117)   Merge Heap Roots: 0.1ms
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.942+0000][8][gc,phases   ] GC(117)   Evacuate Collection Set: 13.4ms
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.942+0000][8][gc,phases   ] GC(117)   Post Evacuate Collection Set: 2.1ms
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.942+0000][8][gc,phases   ] GC(117)   Other: 0.2ms
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.942+0000][8][gc,heap     ] GC(117) Eden regions: 612->0(612)
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.942+0000][8][gc,heap     ] GC(117) Survivor regions: 2->2(77)
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.942+0000][8][gc,heap     ] GC(117) Old regions: 81->81
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.942+0000][8][gc,heap     ] GC(117) Archive regions: 2->2
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.942+0000][8][gc,heap     ] GC(117) Humongous regions: 18->18
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.942+0000][8][gc,metaspace] GC(117) Metaspace: 126889K(128512K)->126889K(128512K) NonClass: 110422K(111360K)->110422K(111360K) Class: 16466K(17152K)->16466K(17152K)
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.942+0000][8][gc          ] GC(117) Pause Young (Normal) (G1 Evacuation Pause) 2855M->406M(4096M) 16.069ms
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.942+0000][8][gc,cpu      ] GC(117) User=0.02s Sys=0.00s Real=0.02s
	atlas-elasticsearch-master-0

May 17, 2022 @ 05:33:52.965	[2022-05-17T00:03:52.942+0000][8][safepoint   ] Safepoint "G1CollectForAllocation", Time since last: 3466157501982 ns, Reaching safepoint: 33222 ns, At safepoint: 16140096 ns, Total: 16173318 ns
	atlas-elasticsearch-master-0


May 17, 2022 @ 05:31:56.154	[2022-05-17T00:01:55.792+0000][8][safepoint   ] Safepoint "Cleanup", Time since last: 21003821285 ns, Reaching safepoint: 118782 ns, At safepoint: 5680 ns, Total: 124462 ns
	atlas-elasticsearch-master-1

May 17, 2022 @ 05:31:35.151	[2022-05-17T00:01:34.788+0000][8][safepoint   ] Safepoint "Cleanup", Time since last: 574120261361 ns, Reaching safepoint: 260473 ns, At safepoint: 35751 ns, Total: 296224 ns

Velero Partial failure log:

time="2022-05-17T00:14:10Z" level=info msg="1 errors encountered backup up item" backup=velero/velero-atlan-backup-20220517000038 logSource="pkg/backup/backup.go:413" name=logging-master-0
time="2022-05-17T00:14:10Z" level=error msg="Error backing up item" backup=velero/velero-atlan-backup-20220517000038 error="pod volume backup failed: error running restic backup, stderr={\"message_type\":\"error\",\"error\":{\"Op\":\"lstat\",\"Path\":\"nodes/0/indices/udnkYHyJTNuXbO0KZa24-Q\",\"Err\":2},\"during\":\"archival\",\"item\":\"/host_pods/71246dca-79c1-4bb6-9f5e-b5a058773186/volumes/kubernetes.io~aws-ebs/pvc-aacd1fc3-4a29-4833-b9e5-8a9554945804/nodes/0/indices/udnkYHyJTNuXbO0KZa24-Q\"}\nWarning: at least one source file could not be read\n: exit status 3" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/restic/backupper.go:184" error.function="github.com/vmware-tanzu/velero/pkg/restic.(*backupper).BackupPodVolumes" logSource="pkg/backup/backup.go:417" name=logging-master-0
time="2022-05-17T00:44:23Z" level=info msg="1 errors encountered backup up item" backup=velero/velero-atlan-backup-20220517000038 logSource="pkg/backup/backup.go:413" name=redis-node-0

Regards,
Shaun

Note that volume or file system snapshots is not a supported backup mechanism with Elasticsearch. You need to instead use the snapshot and restore API, e.g. through ILM.

Thank you Christian.

Will there be a release on supporting file system backup soon?

Snapshot and restore allows you to back up to a shared filesystem as well as cloud storage. This has been the only supported backup mechanism for ages, and I do not anticipate that to change.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.