There are other architectures where the dropping of data is possible.
Broadly speaking there are two kinds of sources:
- Persistent and replay-able - File (Filebeat), Kafka, RabbitMq, SQL databases, S3 etc.
- Transient - Networked Machine to LS connections without spooling or buffering in between. This includes devices like firewalls and machines that are ephemeral like Docker instances where the design does not preserve the log files after the instance has been destroyed.
However, bugs have to be taken into account. If, say, a Kafka input has pulled some data and the offset is updated but a bug causes LS to lose the event before it is put into the destination (ES or the Dead Letter Queue) then that data cannot be replayed without human intervention.
These bugs, more often than not, are manifest as a direct result of high volumes of data inflow with LS/ES configurations that cannot keep up with this volume.
Not every installation is designed for high volume inflow from the get go. Some people start with a performant configuration that works for a test/trial volume and then slowly connect more and more parts of the business operation to the ingest infrastructure - until . In other cases the inflow volume exhibits peaks of high volume that occur daily or as a result of a special event.
To be honest, Logstash does not have a great story around pre-emptive alerting when such a limit is approaching (its coming), people have to roll their own at present. There is no Scotty to warn "She cannae take any more, Captain!".
We, Elastic and the Logstash team, are putting enormous efforts into changing Logstash into a turn-key high volume capable product but the code surface area (and configuration permutations) are large.