Filebeat duplicate logs

m3bgwad · December 12, 2022, 12:53pm

Hello everyone,

for prevent the duplication data that can be received from Filebeat I used this Logstash filtration

fingerprint {
source => "message"
target => "[@metadata][fingerprint]"
method => "SHA1"
key => "key"
base64encode => true
}

but if appears duplicate inside log file this filtration also prevent

I need to prevent duplicate data that can happen from some cases like filebeat restart but if log file there is in it data duplication i want to permit this

thank you

Christian_Dahlqvist · December 12, 2022, 1:07pm

Are you using the fingerprint as document ID in your Elasticsearch output? Are you indexing into time-based indices based on rollover?

m3bgwad · December 13, 2022, 7:16am

yes, I don't have problems in the configurations Logstash pipeline but the problem in how to prevent the duplicate data if case happens like filebeat restart without prevent the duplication if happens from the logs

Christian_Dahlqvist · December 13, 2022, 7:22am

I do not understand what you mean. Can you please elaborate?

m3bgwad · December 13, 2022, 7:44am

some cases can cause the data duplication like filebeat restarting for control in this case I used fingerprint filtration but this approach prevent everything of duplication even if the log file there is duplicate of data.
I want to make if the duplication from filebeat or everything it is prevent but if it from log file i want to permit

Christian_Dahlqvist · December 13, 2022, 7:59am

Filebeat should be able to handle restarts without duplicating a lot of data. What type of storage are you reading from? How is your Filebeat configured?

In your example you are calculating a fingerprint based on the contents of the log line. Identical log lines in log files will therefore result in the same fingerprint and cause updates in Elasticsearch. You could add the filename to the string you use to determine the fingerprint and this would allow the same log line from different files to be inserted without resulting in updates.

m3bgwad · December 13, 2022, 8:29am

if in the same log file there is duplications, what is the solution?

Christian_Dahlqvist · December 13, 2022, 8:46am

Filebeat can add/adds the offset for each log line so you could include this when calculating the fingerprint. I do not believe the Logstash file input plugin is able to do this.

system · January 10, 2023, 8:47am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Duplicated date in my elastic Logstash	6	315	November 1, 2022
How to prevent duplicates in filebeat Beats filebeat	3	956	March 7, 2022
Duplicate events in filebeat + logstash + elasticsearch pipeline Logstash	2	1913	July 6, 2017
Regarding Deduplication Beats	3	679	June 5, 2019
Filebeat, Logstash, Elasticsearch robustness and duplicated documents Beats filebeat	11	4272	July 5, 2017

Filebeat duplicate logs

Related topics