Logstash is loading duplicates

Create a log file in hdfs from hue with one record.

Run nifi job to copy log file from hdfs to elk server folder.

Run logstash job.

input {
file {
path => "/path/*.log"
type => "test"
start_position => beginning
sincedb_path => "/path/test"
}
}
filter {
if [type] == "test" {
grok {
match => {
"message" => "%{GREEDYDATA:test}"
}
}
}
}
output {
if [type] == "test" {
elasticsearch {
hosts => "ip:9200"
index => "test"
}
}
else {
stdout {
codec => rubydebug
}
}
}

List index documents.

curl -XGET 'ip:9200/test/_search?q=*&pretty&pretty'

"message" : "APBaaN APBaaN_BusinessPartner 07-05-2018 04:35:434 2018-05-07 06:41:04.844 941",
"@timestamp" : "2018-06-22T19:48:42.326Z",

Append 2nd row to hdfs log file.

hdfs dfs -appendToFile - /path/test.log

APBaaN APBaaN_BusinessPartner 07-06-2018 05:35:434 2018-06-07 07:41:04.844 941

List index documents.

curl -XGET 'ip:9200/test/_search?q=*&pretty&pretty'

"message" : "APBaaN APBaaN_BusinessPartner 07-05-2018 04:35:434 2018-05-07 06:41:04.844 941",
"@timestamp" : "2018-06-22T19:48:42.326Z",

"message" : "APBaaN APBaaN_BusinessPartner 07-05-2018 04:35:434 2018-05-07 06:41:04.844 941",
"@timestamp" : "2018-06-22T20:28:41.471Z",

"message" : "APBaaN APBaaN_BusinessPartner 07-06-2018 05:35:434 2018-06-07 07:41:04.844 941",
"@timestamp" : "2018-06-22T20:28:41.471Z",

1st and 2nd are duplicates.

Exactly how are you adding the new row to the file? If you are using an editor, this does not append, but generally results in a new file being created and renamed. Logstash will process this as a new file and send both lines. You should in this case see two entries in .sincedb.

I am using an editor to add 2nd row to log file.

That explains it then.

if I don't use editor then how do I test adding 2nd row.

If you are on Linux: echo "Test log entry" >> /path/test.log

There is no hdfs dfs -echo command.

There is hdfs dfs -appendToFile command.

I have included nifi job in my original post.

OK, you are trying to read from HDFS. That could indeed be a problem. I would recommend you read this note about the file input plugin and remote file systems. The same applies to Filebeat.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.