For beats that don't depend on a local system resource, such as a beat for a web API, having a some standard options to store state remotely would be a huge win when running on infrastructure without persistent disk storage.
Providing an io.ReadWriter backed by a configurable local, elasticsearch, etcd, or consul storage backend as part of the standard beats framework might be a good way to do it. Developers can be free to read and write their state blobs, and users can have more flexibility in their deployments.
Just to make sure I understand, you are referring to custom Beats, and you want libbeat to never write to local disk, right?
If it could be done in a generic way, we'd be open to have that in libbeat, but I'm not sure how easy it will be. Can you try opening a PoC pull request? Doesn't need to be complete, but just to demonstrate the concept.
Yes, I'm proposing adding a configurable storage backend for state tracking in libbeat.
So for Filebeat, for example:
What is now filebeat.registry_file: registry would instead become:
state:
storage_driver: local
storage_opts:
path: registry
A different beat, such as one that retrieves Google Suite Oauth authorizations, you could use something like this, and run it on Kubernetes without local storage:
It's an interesting idea. Currently the state store for filebeat and winlogbeat are custom per beat. We hope to change this, so to have a common API for storing/updating state in libbeat in the future. As first iteration we still want to go for local files, but e.g. considering k8s we might indeed opt to support other backends. But with backends based on networks, the chance of things going wrong will increase as well.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.