For logging in AWS EC2, I'm testing the robustness of the chain Filebeat, Logstash, Elasticsearch. I have one AMI with an appplication + Filebeat, one with Logstash and one with Elastisearch + Kibana. With the application running I try to reboot one of these 3 machines and see what happens when it's back available.
The good news is that I never loose any line of log. The less good one is that most of the times I end up having duplicated logs in Elasticsearch. Typically I generate X lines (let's say 100K) and I have X + few hundreds in ES.
Why does it happen? Is there a way to prevent it (or remove them afterwards) knowing that my log lines have no unique id?