I have a nice curator --> AWS S3 snapshot pipeline for backing up my clusters, but from time-to-time a snapshot will fail and thus I'm wondering what the best way to monitor snapshot failures is?
Right now I'm rsyslogging all of my curator events to my cluster and I was planning on using watchers to email based on certain conditions (like failures). Rsyslog works great and I've no doubt the watchers will technically work...but I don't know what the best thing to scrape for would be? I've thought about watching for the literal exception strings thrown on the various snapshot failure types and I think that will work to a certain extent. But my concern is if I update curator or ES or whatever and those exception messages change then my watchers will break. If you have any good ideas of what to 'watch' for or a totally different approach not involving watchers or log parsing I'd be very interested to know your thoughts.