Duplicate events in filebeat + logstash + elasticsearch pipeline

Filebeat version:1.1.1
Logstash version:2.1.0
Logstash multiline codec : 2.0.9
Logstash beats plugin: 2.1.4
Elasticsearch version: 2.1.0

I am shipping logs from a linux server using filebeat. The only configuration of note here is the use of the multiline option. It ships to Logstash.

Logstash executes a series of filters on the input events. Substitutions, field additions, timestamp extraction, then inserts to Elasticsearch.

When I comment out the Elasticsearch output from the logstash configuration file and instead just use rubydebug to dump to a file, the number of events exactly matches the number of events in the logs being harvested on the filebeat server.

When I use Elasticsearch as an output, I see a small percentage of events has been duplicated. I have verified this with Kibana. The filename, offset, message content is exactly the same. This only happens when Elasticsearch is an output. I also left the rubydebug output enabled and I have verified the duplication in the output file itself, so its definitely happening at Logstash or the forwarder and not just in Elasticsearch.

I searched for a solution and found a suggestion to use fingerprinting to prevent duplication of events in Elasticsearch. However, I would like to know why this occurs in the first place. Is this a known issue? Is it a bug or is something amiss with my configuration? If it's a bug, could you please point me to the github bug#?

What kind of performance cost will I incur if I turn on fingerprinting?

I don't think it is the intention of any plugins to duplicate data and usually the reason is due to a nuance in the environment producing the log stream or the configuration of the pipeline.

Can you share your configs and elaborate on the application sources which are generating the files? There are just so many variables that to help properly, we need more context.