Currently using two separate index patterns (logstash-* and filebeat-*), with separate dashboards. The live logs go to a local machine that is then read through filebeat and sent to elasticsearch which sends to kibana. Every night those same logs get copied to a different location which logstash sees and ingests. By default filebeat puts everything into a filebeat-yyyy-mm index while logstash does logstash-yyyy-mm index. So the data is the same but because of the labeling they get into kibana and are not considered duplicate. Any ideas on how to remove the duplicate data? Ideally I would like to get rid of the filebeat data after logstash does its ingest.
Is the goal to remove duplicate data from elasticsearch or not have it visible in Kibana?
If it's from elasticsearch, the easiest way I can think of is to use the same logstash pipeline for all ingestion. Filebeat logs can send to logstash, and logstash can use the document_id field and an id field or something like https://www.elastic.co/guide/en/logstash/current/plugins-filters-fingerprint.html to get a consistent id.
Can you add a little more detail about the nightly logstash job vs filebeat? Maybe we can use one or the other and keep the log files in storage if we ever need to repopulate.
Both if we can, right now Kibana shows same data on logstash and filebeat index.
Nightly, the live log files get stored in a directory that logstash checks for apache log files.
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.