Hi Folks!
We are running Elasticsearch cluster on our EKS cluster and using Velero to backup Elasticsearch.
Elasticsearch version: 7.16.2
Number of ES Nodes: 3
We noticed that Velero partially fails to fetch the index from this path: nodes/0/indices/xxxxxx
The particular index is not present under indices inside the node either. Not sure whether it's getting deleted somehow?.
We did not see any unusual messages indicating that the index is being deleted.
Would be great if you could assist us to fix this.
Here are the logs for ES:
May 17, 2022 @ 06:00:01.609 [2022-05-17T00:30:01.218+0000][8][gc,age ] GC(194) - age 14: 208 bytes, 6134736 total
atlas-elasticsearch-master-2
May 17, 2022 @ 06:00:01.609 [2022-05-17T00:30:01.218+0000][8][gc,age ] GC(194) - age 15: 355608 bytes, 6490344 total
atlas-elasticsearch-master-2
May 17, 2022 @ 06:00:01.609 [2022-05-17T00:30:01.218+0000][8][gc,phases ] GC(194) Pre Evacuate Collection Set: 0.1ms
atlas-elasticsearch-master-2
May 17, 2022 @ 06:00:01.609 [2022-05-17T00:30:01.218+0000][8][gc,phases ] GC(194) Merge Heap Roots: 0.1ms
atlas-elasticsearch-master-2
May 17, 2022 @ 06:00:01.609 [2022-05-17T00:30:01.218+0000][8][gc,phases ] GC(194) Evacuate Collection Set: 13.0ms
atlas-elasticsearch-master-2
May 17, 2022 @ 06:00:01.609 [2022-05-17T00:30:01.218+0000][8][gc,phases ] GC(194) Post Evacuate Collection Set: 3.2ms
atlas-elasticsearch-master-2
May 17, 2022 @ 06:00:01.609 [2022-05-17T00:30:01.218+0000][8][gc,phases ] GC(194) Other: 0.3ms
atlas-elasticsearch-master-2
May 17, 2022 @ 06:00:01.609 [2022-05-17T00:30:01.218+0000][8][gc,heap ] GC(194) Eden regions: 612->0(612)
atlas-elasticsearch-master-2
May 17, 2022 @ 06:00:01.609 [2022-05-17T00:30:01.218+0000][8][gc,heap ] GC(194) Survivor regions: 2->2(77)
atlas-elasticsearch-master-2
May 17, 2022 @ 06:00:01.609 [2022-05-17T00:30:01.218+0000][8][gc,heap ] GC(194) Old regions: 111->111
atlas-elasticsearch-master-2
May 17, 2022 @ 06:00:01.609 [2022-05-17T00:30:01.218+0000][8][gc,heap ] GC(194) Archive regions: 2->2
atlas-elasticsearch-master-2
May 17, 2022 @ 06:00:01.609 [2022-05-17T00:30:01.218+0000][8][gc,heap ] GC(194) Humongous regions: 16->16
atlas-elasticsearch-master-2
May 17, 2022 @ 06:00:01.609 [2022-05-17T00:30:01.218+0000][8][gc,metaspace] GC(194) Metaspace: 133600K(135360K)->133600K(135360K) NonClass: 116536K(117568K)->116536K(117568K) Class: 17064K(17792K)->17064K(17792K)
atlas-elasticsearch-master-2
May 17, 2022 @ 06:00:01.609 [2022-05-17T00:30:01.218+0000][8][gc ] GC(194) Pause Young (Normal) (G1 Evacuation Pause) 2966M->517M(4096M) 16.751ms
atlas-elasticsearch-master-2
May 17, 2022 @ 06:00:01.609 [2022-05-17T00:30:01.218+0000][8][gc,cpu ] GC(194) User=0.02s Sys=0.00s Real=0.02s
atlas-elasticsearch-master-2
May 17, 2022 @ 06:00:01.609 [2022-05-17T00:30:01.218+0000][8][safepoint ] Safepoint "G1CollectForAllocation", Time since last: 1380811914521 ns, Reaching safepoint: 110232 ns, At safepoint: 16832914 ns, Total: 16943146 ns
atlas-elasticsearch-master-2
May 17, 2022 @ 05:50:02.385 [2022-05-17T00:20:01.957+0000][8][safepoint ] Safepoint "Cleanup", Time since last: 20007202411 ns, Reaching safepoint: 97521 ns, At safepoint: 5320 ns, Total: 102841 ns
atlas-elasticsearch-master-1
May 17, 2022 @ 05:49:42.382 [2022-05-17T00:19:41.950+0000][8][safepoint ] Safepoint "Cleanup", Time since last: 59010060600 ns, Reaching safepoint: 95512 ns, At safepoint: 26230 ns, Total: 121742 ns
atlas-elasticsearch-master-1
May 17, 2022 @ 05:48:43.373 [2022-05-17T00:18:42.940+0000][8][safepoint ] Safepoint "Cleanup", Time since last: 94417368667 ns, Reaching safepoint: 140122 ns, At safepoint: 6150 ns, Total: 146272 ns
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.492+0000][8][gc,start ] GC(35) Pause Young (Normal) (G1 Evacuation Pause)
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.492+0000][8][gc,task ] GC(35) Using 1 workers of 1 for evacuation
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.492+0000][8][gc,age ] GC(35) Desired survivor size 161480704 bytes, new threshold 15 (max threshold 15)
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][gc,age ] GC(35) Age table with threshold 15 (max threshold 15)
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][gc,age ] GC(35) - age 1: 2096264 bytes, 2096264 total
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][gc,age ] GC(35) - age 2: 27904 bytes, 2124168 total
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][gc,age ] GC(35) - age 3: 351080 bytes, 2475248 total
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][gc,age ] GC(35) - age 4: 1147344 bytes, 3622592 total
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][gc,age ] GC(35) - age 5: 408704 bytes, 4031296 total
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][gc,age ] GC(35) - age 6: 31048 bytes, 4062344 total
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][gc,age ] GC(35) - age 7: 54296 bytes, 4116640 total
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][gc,age ] GC(35) - age 8: 35712 bytes, 4152352 total
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][gc,age ] GC(35) - age 9: 116504 bytes, 4268856 total
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][gc,age ] GC(35) - age 10: 52728 bytes, 4321584 total
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][gc,age ] GC(35) - age 11: 37464 bytes, 4359048 total
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][gc,age ] GC(35) - age 12: 15968 bytes, 4375016 total
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][gc,age ] GC(35) - age 13: 1706704 bytes, 6081720 total
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][gc,age ] GC(35) - age 14: 341984 bytes, 6423704 total
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][gc,age ] GC(35) - age 15: 1218104 bytes, 7641808 total
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][gc,phases ] GC(35) Pre Evacuate Collection Set: 0.2ms
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][gc,phases ] GC(35) Merge Heap Roots: 0.4ms
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][gc,phases ] GC(35) Evacuate Collection Set: 26.3ms
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][gc,phases ] GC(35) Post Evacuate Collection Set: 2.4ms
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][gc,phases ] GC(35) Other: 0.4ms
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][gc,heap ] GC(35) Eden regions: 611->0(612)
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][gc,heap ] GC(35) Survivor regions: 3->2(77)
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][gc,heap ] GC(35) Old regions: 21->21
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][gc,heap ] GC(35) Archive regions: 2->2
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][gc,heap ] GC(35) Humongous regions: 2->2
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][gc,metaspace] GC(35) Metaspace: 117229K(118720K)->117229K(118720K) NonClass: 101928K(102784K)->101928K(102784K) Class: 15301K(15936K)->15301K(15936K)
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][gc ] GC(35) Pause Young (Normal) (G1 Evacuation Pause) 2549M->103M(4096M) 29.855ms
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][gc,cpu ] GC(35) User=0.03s Sys=0.01s Real=0.04s
atlas-elasticsearch-master-1
May 17, 2022 @ 05:47:09.353 [2022-05-17T00:17:08.522+0000][8][safepoint ] Safepoint "G1CollectForAllocation", Time since last: 607641133964 ns, Reaching safepoint: 129421 ns, At safepoint: 29983831 ns, Total: 30113252 ns
atlas-elasticsearch-master-1
May 17, 2022 @ 05:37:01.225 [2022-05-17T00:07:00.851+0000][8][safepoint ] Safepoint "Cleanup", Time since last: 2000296657 ns, Reaching safepoint: 98512 ns, At safepoint: 4290 ns, Total: 102802 ns
atlas-elasticsearch-master-1
May 17, 2022 @ 05:37:00.435 [2022-05-17T00:07:00.389+0000][8][safepoint ] Safepoint "Cleanup", Time since last: 117013643612 ns, Reaching safepoint: 97211 ns, At safepoint: 27141 ns, Total: 124352 ns
atlas-elasticsearch-master-2
May 17, 2022 @ 05:36:59.225 [2022-05-17T00:06:58.851+0000][8][safepoint ] Safepoint "Cleanup", Time since last: 303058536926 ns, Reaching safepoint: 144412 ns, At safepoint: 6630 ns, Total: 151042 ns
atlas-elasticsearch-master-1
May 17, 2022 @ 05:35:03.421 [2022-05-17T00:05:03.375+0000][8][safepoint ] Safepoint "Cleanup", Time since last: 2701328437869 ns, Reaching safepoint: 129033 ns, At safepoint: 7580 ns, Total: 136613 ns
atlas-elasticsearch-master-2
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.926+0000][8][gc,start ] GC(117) Pause Young (Normal) (G1 Evacuation Pause)
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.926+0000][8][gc,task ] GC(117) Using 1 workers of 1 for evacuation
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.926+0000][8][gc,age ] GC(117) Desired survivor size 161480704 bytes, new threshold 15 (max threshold 15)
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.941+0000][8][gc,age ] GC(117) Age table with threshold 15 (max threshold 15)
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.941+0000][8][gc,age ] GC(117) - age 1: 1215920 bytes, 1215920 total
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.941+0000][8][gc,age ] GC(117) - age 2: 19088 bytes, 1235008 total
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.941+0000][8][gc,age ] GC(117) - age 3: 920648 bytes, 2155656 total
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.941+0000][8][gc,age ] GC(117) - age 4: 515320 bytes, 2670976 total
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.941+0000][8][gc,age ] GC(117) - age 5: 336976 bytes, 3007952 total
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.941+0000][8][gc,age ] GC(117) - age 6: 1181872 bytes, 4189824 total
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.941+0000][8][gc,age ] GC(117) - age 7: 152 bytes, 4189976 total
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.941+0000][8][gc,age ] GC(117) - age 8: 355048 bytes, 4545024 total
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.941+0000][8][gc,age ] GC(117) - age 9: 1173904 bytes, 5718928 total
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.941+0000][8][gc,age ] GC(117) - age 10: 150400 bytes, 5869328 total
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.941+0000][8][gc,age ] GC(117) - age 11: 577368 bytes, 6446696 total
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.941+0000][8][gc,age ] GC(117) - age 12: 573384 bytes, 7020080 total
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.941+0000][8][gc,age ] GC(117) - age 13: 1072 bytes, 7021152 total
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.942+0000][8][gc,age ] GC(117) - age 15: 1912 bytes, 7023064 total
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.942+0000][8][gc,phases ] GC(117) Pre Evacuate Collection Set: 0.2ms
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.942+0000][8][gc,phases ] GC(117) Merge Heap Roots: 0.1ms
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.942+0000][8][gc,phases ] GC(117) Evacuate Collection Set: 13.4ms
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.942+0000][8][gc,phases ] GC(117) Post Evacuate Collection Set: 2.1ms
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.942+0000][8][gc,phases ] GC(117) Other: 0.2ms
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.942+0000][8][gc,heap ] GC(117) Eden regions: 612->0(612)
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.942+0000][8][gc,heap ] GC(117) Survivor regions: 2->2(77)
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.942+0000][8][gc,heap ] GC(117) Old regions: 81->81
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.942+0000][8][gc,heap ] GC(117) Archive regions: 2->2
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.942+0000][8][gc,heap ] GC(117) Humongous regions: 18->18
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.942+0000][8][gc,metaspace] GC(117) Metaspace: 126889K(128512K)->126889K(128512K) NonClass: 110422K(111360K)->110422K(111360K) Class: 16466K(17152K)->16466K(17152K)
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.942+0000][8][gc ] GC(117) Pause Young (Normal) (G1 Evacuation Pause) 2855M->406M(4096M) 16.069ms
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.942+0000][8][gc,cpu ] GC(117) User=0.02s Sys=0.00s Real=0.02s
atlas-elasticsearch-master-0
May 17, 2022 @ 05:33:52.965 [2022-05-17T00:03:52.942+0000][8][safepoint ] Safepoint "G1CollectForAllocation", Time since last: 3466157501982 ns, Reaching safepoint: 33222 ns, At safepoint: 16140096 ns, Total: 16173318 ns
atlas-elasticsearch-master-0
May 17, 2022 @ 05:31:56.154 [2022-05-17T00:01:55.792+0000][8][safepoint ] Safepoint "Cleanup", Time since last: 21003821285 ns, Reaching safepoint: 118782 ns, At safepoint: 5680 ns, Total: 124462 ns
atlas-elasticsearch-master-1
May 17, 2022 @ 05:31:35.151 [2022-05-17T00:01:34.788+0000][8][safepoint ] Safepoint "Cleanup", Time since last: 574120261361 ns, Reaching safepoint: 260473 ns, At safepoint: 35751 ns, Total: 296224 ns
Velero Partial failure log:
time="2022-05-17T00:14:10Z" level=info msg="1 errors encountered backup up item" backup=velero/velero-atlan-backup-20220517000038 logSource="pkg/backup/backup.go:413" name=logging-master-0
time="2022-05-17T00:14:10Z" level=error msg="Error backing up item" backup=velero/velero-atlan-backup-20220517000038 error="pod volume backup failed: error running restic backup, stderr={\"message_type\":\"error\",\"error\":{\"Op\":\"lstat\",\"Path\":\"nodes/0/indices/udnkYHyJTNuXbO0KZa24-Q\",\"Err\":2},\"during\":\"archival\",\"item\":\"/host_pods/71246dca-79c1-4bb6-9f5e-b5a058773186/volumes/kubernetes.io~aws-ebs/pvc-aacd1fc3-4a29-4833-b9e5-8a9554945804/nodes/0/indices/udnkYHyJTNuXbO0KZa24-Q\"}\nWarning: at least one source file could not be read\n: exit status 3" error.file="/go/src/github.com/vmware-tanzu/velero/pkg/restic/backupper.go:184" error.function="github.com/vmware-tanzu/velero/pkg/restic.(*backupper).BackupPodVolumes" logSource="pkg/backup/backup.go:417" name=logging-master-0
time="2022-05-17T00:44:23Z" level=info msg="1 errors encountered backup up item" backup=velero/velero-atlan-backup-20220517000038 logSource="pkg/backup/backup.go:413" name=redis-node-0
Regards,
Shaun