Filebeat version:1.1.1
Logstash version:2.1.0
Logstash multiline codec : 2.0.9
Logstash beats plugin: 2.1.4
Elasticsearch version: 2.1.0
I am shipping logs from a linux server using filebeat. The only configuration of note here is the use of the multiline option. It ships to Logstash.
Logstash executes a series of filters on the input events. Substitutions, field additions, timestamp extraction, then inserts to Elasticsearch.
When I comment out the Elasticsearch output from the logstash configuration file and instead just use rubydebug to dump to a file, the number of events exactly matches the number of events in the logs being harvested on the filebeat server.
When I use Elasticsearch as an output, I see a small percentage of events has been duplicated. I have verified this with Kibana. The filename, offset, message content is exactly the same. This only happens when Elasticsearch is an output. I also left the rubydebug output enabled and I have verified the duplication in the output file itself, so its definitely happening at Logstash or the forwarder and not just in Elasticsearch.
I searched for a solution and found a suggestion to use fingerprinting to prevent duplication of events in Elasticsearch. However, I would like to know why this occurs in the first place. Is this a known issue? Is it a bug or is something amiss with my configuration? If it's a bug, could you please point me to the github bug#?
What kind of performance cost will I incur if I turn on fingerprinting?