I am using filebeat for transferring logs and processing using logstash and storing into elasticsearch as well as in CSV file but i can see duplicate events in both CSV file and Elasticsearch.Could any one help me on this.
Basically when you insert a document into Elasticsearch it creates it's own document_id. Think of it as a primary key in a database. If you want you can set your own document ID in the Elasticsearch Logstash output. Now if a duplicate event comes in, it will overwrite and update the existing event instead of creating a new one. IN your case all of the data will be the same, but it will stop the duplicate. Your CSV file just has to have some kind of unique identifier.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.