Duplicate Entries of Log data

(Niket Vyas) #1


I have multiple copies for a single entry of a log file in Elasticsearch. For eg. In my original log file, for a timestamp 12:55:03:234 there is a log entry X. In Elasticsearch for that timestamp, I have duplicates of X (only difference is _id). Now this number of duplicates keeps varying for different log entries and there are entries for which no duplicates exist.
Please help me understand why this is happening?
How can I remove the duplicate log entries from elastic?


(Christian Dahlqvist) #2

Have a look at this blog post. What does your ingest pipeline look like?

(Gautam) #3


I too have the same issue. Please find below my logstash config code:

input {

	file {
		type => "SystemError"
		path => "/data/others/*"
		start_position => "beginning"
		sincedb_path => "/dev/null"

		codec => multiline {
			pattern => "^(\s | \[%{DATA}] %{BASE16NUM} (?<shortname>\b[A-Za-z0-9\$]{2,}\b)%{SPACE}%{WORD}%{SPACE}\tat)"
			what => "previous"



filter {
	grok {
		match => { "message" => "\[%{DATA:timestamp}] %{BASE16NUM:threadID} (?<shortname>\b[A-Za-z0-9\$]{2,}\b)%{SPACE}%{WORD:loglevel}%{SPACE} %{GREEDYDATA:message}" }
		overwrite => [ "message" ]

	date {
		match => ["timestamp", "M/dd/yy HH:mm:ss:SSS zZZ"]

  	fingerprint {
    		source => ["@timestamp", "threadID", "message"]
		concatenate_sources => true
    		target => "%{[@metadata][fingerprint]}"
    		method => "MURMUR3"

output {
#	stdout { codec => rubydebug }

	elasticsearch {
        	hosts => ["x.x.x.x"]
		user => 'elastic'
		password => 'xxxx'
		document_id => "%{[@metadata][fingerprint]}"
		index => "logstash-%{+YYYY.MM.dd}"
		manage_template => false


Please help. The code is not working.


(Niket Vyas) #4

I will go through the blog post and let you know if the problem gets solved.


(Christian Dahlqvist) #5

This means that files will be reprocessed every time you restart Logstash, so I am not surprised you are seeing duplicates.

(Gautam) #6


This code is just for testing purposes. I have removed sincedb_path => “/dev/null” in my actual code.

(system) #7

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.