Logstash is loading duplicates


#1

Create a log file in hdfs from hue with one record.

Run nifi job to copy log file from hdfs to elk server folder.

Run logstash job.

input {
file {
path => "/path/*.log"
type => "test"
start_position => beginning
sincedb_path => "/path/test"
}
}
filter {
if [type] == "test" {
grok {
match => {
"message" => "%{GREEDYDATA:test}"
}
}
}
}
output {
if [type] == "test" {
elasticsearch {
hosts => "ip:9200"
index => "test"
}
}
else {
stdout {
codec => rubydebug
}
}
}

List index documents.

curl -XGET 'ip:9200/test/_search?q=*&pretty&pretty'

"message" : "APBaaN APBaaN_BusinessPartner 07-05-2018 04:35:434 2018-05-07 06:41:04.844 941",
"@timestamp" : "2018-06-22T19:48:42.326Z",

Append 2nd row to hdfs log file.

hdfs dfs -appendToFile - /path/test.log

APBaaN APBaaN_BusinessPartner 07-06-2018 05:35:434 2018-06-07 07:41:04.844 941

List index documents.

curl -XGET 'ip:9200/test/_search?q=*&pretty&pretty'

"message" : "APBaaN APBaaN_BusinessPartner 07-05-2018 04:35:434 2018-05-07 06:41:04.844 941",
"@timestamp" : "2018-06-22T19:48:42.326Z",

"message" : "APBaaN APBaaN_BusinessPartner 07-05-2018 04:35:434 2018-05-07 06:41:04.844 941",
"@timestamp" : "2018-06-22T20:28:41.471Z",

"message" : "APBaaN APBaaN_BusinessPartner 07-06-2018 05:35:434 2018-06-07 07:41:04.844 941",
"@timestamp" : "2018-06-22T20:28:41.471Z",

1st and 2nd are duplicates.


(Christian Dahlqvist) #2

Exactly how are you adding the new row to the file? If you are using an editor, this does not append, but generally results in a new file being created and renamed. Logstash will process this as a new file and send both lines. You should in this case see two entries in .sincedb.


#3

I am using an editor to add 2nd row to log file.


(Christian Dahlqvist) #4

That explains it then.


#5

if I don't use editor then how do I test adding 2nd row.


(Christian Dahlqvist) #6

If you are on Linux: echo "Test log entry" >> /path/test.log


#7

There is no hdfs dfs -echo command.

There is hdfs dfs -appendToFile command.

I have included nifi job in my original post.


(Christian Dahlqvist) #8

OK, you are trying to read from HDFS. That could indeed be a problem. I would recommend you read this note about the file input plugin and remote file systems. The same applies to Filebeat.


(system) #9

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.