registry_flush dictates when and how often the registry is serialized to disk. All state changes will be buffered in main memory, before the flush happens. The way the registry works right now is, it keeps all state in memory and the registry file update is basically a snapshot of the current state. With
registry_flush: 0 (the default), each ACKed batch of events will trigger a snapshot.
State updates do include file renames and offsets of the last send events. If the state is not flushed yet, but filebeat is restarted, filebeat will have to send already published events again. Filebeat flushes the registry on normal shutdown, but if the machine, or filebeat crashes, or if filebeat is forced to be shutdown, then the final registry flush is missing. This leads to duplicates. As some events can be in the pipeline (not yet being ACKed), also use shutdown_timeout, to reduce the chance of duplicates.
There is no 'perfect' value for registry_flush. It's more of a trade-off between chance of duplicates on crashes and overall disk IO. It's some 'risk' you will have to take as user. The number of duplicates you might experience depends on the event rate and the registry_flush. Roughly estimated to be
avg eps * registry_flush.
A missing flush will have filebeat to restart with some old state. But on startup, filebeat resyncs the in-memory registry state, so to continue processing from 'old' offsets.
The setting not being documented is a bug. Please open an issue here. Thanks.