More general than Graphite Beat, I would suggest a beat that can generate metrics and send it to Logstash (which can in turn send it to Graphite / Influx / OpenTSDB / other).
I love the nagios / sensu model of simply having a script scheduler that knows how often a specific metrics scraper is supposed to run (60s, hourly, etc) and then the actual script is left to the operator to implement, or to get one from the community (e.g. sensu-community-plugins, in particular the "metrics" plugins).
The runner that executes other scripts is an extremely flexible approach that lets anyone in the community to get their metrics in the way that makes most sense:
- Query for the metric in different ways (read filesystem, query via HTTP or a custom protocol, run a command, etc)
- Parse various forms of text output
- The operator can use the language that suits them best, if what's available doesn't make sense for their situation
Of course, that doesn't prevent offering such pre-built scripts or even mechanisms other than running external scripts.
The Sensu community took the unfortunate turn of using the Graphite line protocol directly. Since Graphite doesn't support meta-data, this means you have to encode said meta-data in your metric name. E.g. mysql.clustername.master.connections X TS
, which can be hard to extend later without breaking a lot of existing dashboards.
I would propose that Metrics Beats use a more general format, like InfluxDB's or OpenTSDB's. Examples:
Influx: mysql.connections X TS {"cluster": "clustername", "role": "master" }
OpenTSDB: mysql.connections X TS cluster=clustername role=master
Both of which can be turned into Graphite format, if need be.
I hope this didn't turn into too much of a rant. But basically, I was on the verge of getting my team to start writing this tool this week or the next, sending via logstash-forwarder*. So we'll be looking very closely at what you're building and try to contribute and use Beats, if possible.
* Logstash-forwarder/lumberjack is already a secure transport from all of our infrastructure to our ELK cluster. I don't want to have metrics come in via a second, different way that also needs to be secured & so on. This approach has been heavily influenced by the ideas presented in this foundational article, by one of the Kafka creators: The Log: What every software engineer should know about real-time data's unifying abstraction