Duplicate events in filebeat + logstash + elasticsearch pipeline

vgondil · February 22, 2016, 8:57am

Filebeat version:1.1.1
Logstash version:2.1.0
Logstash multiline codec : 2.0.9
Logstash beats plugin: 2.1.4
Elasticsearch version: 2.1.0

I am shipping logs from a linux server using filebeat. The only configuration of note here is the use of the multiline option. It ships to Logstash.

Logstash executes a series of filters on the input events. Substitutions, field additions, timestamp extraction, then inserts to Elasticsearch.

When I comment out the Elasticsearch output from the logstash configuration file and instead just use rubydebug to dump to a file, the number of events exactly matches the number of events in the logs being harvested on the filebeat server.

When I use Elasticsearch as an output, I see a small percentage of events has been duplicated. I have verified this with Kibana. The filename, offset, message content is exactly the same. This only happens when Elasticsearch is an output. I also left the rubydebug output enabled and I have verified the duplication in the output file itself, so its definitely happening at Logstash or the forwarder and not just in Elasticsearch.

I searched for a solution and found a suggestion to use fingerprinting to prevent duplication of events in Elasticsearch. However, I would like to know why this occurs in the first place. Is this a known issue? Is it a bug or is something amiss with my configuration? If it's a bug, could you please point me to the github bug#?

What kind of performance cost will I incur if I turn on fingerprinting?

Joe_Lawson · February 23, 2016, 5:36pm

I don't think it is the intention of any plugins to duplicate data and usually the reason is due to a nuance in the environment producing the log stream or the configuration of the pipeline.

Can you share your configs and elaborate on the application sources which are generating the files? There are just so many variables that to help properly, we need more context.

Topic		Replies	Views
Duplicate events with filebeat -> logstash -> elasticsearch pipeline Logstash	6	2361	November 28, 2017
Loadbalance duplicating the events using logstash Beats filebeat	8	3829	October 14, 2016
Filebeat 5.2.2 'pipelining' option causes (?) event duplication when sending to logstash Beats filebeat	4	1040	May 4, 2017
Filebeat Suddenly duplicating events more than 4 entries per log Beats filebeat	1	341	August 1, 2020
Filebeat, Logstash, Elasticsearch robustness and duplicated documents Beats filebeat	11	4279	July 5, 2017

Duplicate events in filebeat + logstash + elasticsearch pipeline

Related topics