but if appears duplicate inside log file this filtration also prevent
I need to prevent duplicate data that can happen from some cases like filebeat restart but if log file there is in it data duplication i want to permit this
yes, I don't have problems in the configurations Logstash pipeline but the problem in how to prevent the duplicate data if case happens like filebeat restart without prevent the duplication if happens from the logs
some cases can cause the data duplication like filebeat restarting for control in this case I used fingerprint filtration but this approach prevent everything of duplication even if the log file there is duplicate of data.
I want to make if the duplication from filebeat or everything it is prevent but if it from log file i want to permit
Filebeat should be able to handle restarts without duplicating a lot of data. What type of storage are you reading from? How is your Filebeat configured?
In your example you are calculating a fingerprint based on the contents of the log line. Identical log lines in log files will therefore result in the same fingerprint and cause updates in Elasticsearch. You could add the filename to the string you use to determine the fingerprint and this would allow the same log line from different files to be inserted without resulting in updates.
Filebeat can add/adds the offset for each log line so you could include this when calculating the fingerprint. I do not believe the Logstash file input plugin is able to do this.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.