I run a pretty open date pipeline for my company, allowing users to manage their own data pretty well. Unfortunately, this leads to devs releasing code that changes field types on me, introducing document mapping exceptions, "Could not index event to Elasticsearch.", etc.
I'd like to be able to measure this, and put monitoring around it. I've been looking at the logstash metrics but haven't been able to determine if the metrics I need are there (unlikely .. maybe the difference between IN and OUT on the ES output plugin?).
I'm considering taking logstash-plain and loading it in to ES so that I can make the devs watch that and put alerts on it but I'm also concerned that I could end up with a horrible, horrible loop that crashes everything.
How do others do this? What metrics can I use to find these problems? How do others manage this in a multi-team, multi-dev type environment?