Duplicate data parsed by Logstash, which cause duplicate data in Elasticsearch index


(sezanawa) #1

Hallo Eveybody

I am using ELK for Log parsing. I have a Job which dowload Logfiles after every 15 mins to a central log directory. Logstash is configured to parse log files from central directory and send to ES.

as i noticed logstash send same log lines after every 15 mins to ES. It mean after an hour i have same Error message with same timestampt etc 4 times in ES. I am using following configuration in file input filter of logstash.

file {
   	type => "OutputManagement"	
	path => ["D:/logs/ApplicationEntLib*.log"]
	start_position => "end" 
	#sincedb_path => "NUL" 
	ignore_older => 90000 		 	
	codec => multiline {
	  pattern => "^%{WORD};"
	  negate => true
	  what => "previous"
	}

I have following questions.

How do i tell logstash to only parse the new lines in a log file ?
Any idea how stop duplicate parsing of log files ?

Thanks in advance

best regards


(Christian Dahlqvist) #2

Logstash keep track of which files have been processed through inodes. If you copy over a file repeatedly it is therefore likely to show up as a new file and be parsed again. I would recommend installing Filebeat where the log files are generated and have it ship logs in near real time instead of downloading the way you do now.


(sezanawa) #3

Hi @Christian_Dahlqvist

Thats what i also suspected. thats why I even wrote a small piece of code which read the downloaded files and check copy the data from downloaded file to another location. If file does not exist, it create a file otherwise it just copy content to existing file.

logstash is parsing now the files which will not be overwritten but just update contents. Still it has duplicate data ES Indexes.

Filebeat is a good option but i cant access my logstash from Environment where original logfiles are existing. I have installed ELK on Windows VM. How can i check either file ID has been changed which cause parsing every time from start? any thing in file properties ??


(Christian Dahlqvist) #4

In order for the node to remain constant, I believe you need to append the new content to the file Logstash is tracking, not just copy the full file content over.


(sezanawa) #5

Thanks @Christian_Dahlqvist

Now i wrote a small piece of code which append data to the file which logstash is parsing.

I just got next Problem, now Logstash lock the file. Although it has parsed the file but logstah-„Äč Java keep file locked and no other process can appebd any data in file.

Is there any way to tell logstash to not lock a file during it ideal?

I m running logstash in Windows vm. Its running as windows service.

Thanks in advance.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.