How to remove duplicate events in logstash

Hi,

I really need your help.

I am using filebeat for transferring logs and processing using logstash and storing into elasticsearch as well as in CSV file but i can see duplicate events in both CSV file and Elasticsearch.Could any one help me on this.

As far as I am aware, events in logstash are completely separate from each other.

There are two options I can think of though.

  1. Remove the duplicates from the CSV file prior to sending it into Logstash. I assume this isn't possible in your case.
  2. Use a custom document_id so that duplicate events are overwritten.

https://www.elastic.co/guide/en/logstash/current/plugins-outputs-elasticsearch.html#plugins-outputs-elasticsearch-document_id

Basically when you insert a document into Elasticsearch it creates it's own document_id. Think of it as a primary key in a database. If you want you can set your own document ID in the Elasticsearch Logstash output. Now if a duplicate event comes in, it will overwrite and update the existing event instead of creating a new one. IN your case all of the data will be the same, but it will stop the duplicate. Your CSV file just has to have some kind of unique identifier.

Thank you @bhatch for the reply.

Now i am able to remove duplicates in elasticsearch using fingerprint filter but still i am unable to remove duplicates storing in CSV file.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.