I'm afraid I have to pushback here. Reworking the registry support is still a big outstanding topic. It's basically a complete rewrite. Like more tight integration with libbeat, improved shutdown handling/timing, remove complicated logic in filebeat, more efficient updates to disk (flush with big registry is super expensive right now), generalize entry format in order to support multiple source types...
We are aware we've got a many users reading the registry file in custom scripts. So we will have to see how we can properly support this use-case. Some possible solutions we have in mind:
- http endpoint (via unix socket?) to query complete state
filebeat registry state command to query the current state.
- Event stream scripts can subscribe too
- Alternative 'backends', still outputing in old (inefficient) JSON format
But whatever we build now to query the registry, might either complicate the refactoring (having more existing features we have to keep backwards compatible), or might break (at worst will be removed) if required.
I would recommend not to use the local prospector states. Using the per prospectors state is somewhat insecure. The local prospector state is active prior to outputs. The individual prospector states are combined into the global registry only after the events/documents have been ACKed by outputs. That is, the global state is 1) always behind the local prospector states 2) available even if prospector/harvester has been finished or gc'ed state. Upon restart, filebeat is restarted from the global registry state as well. That is, if filebeat is restarted or just blocked by unavailable outputs, you might have used a state value that is not "true". The future of local prospector registry/state is not clear yet. With registry rewrite, the local state might not be required in it's current form anymore.