Logstash is loading duplicates

databeata · June 21, 2018, 12:07am

Create a log file in hdfs from hue with one record.

Run nifi job to copy log file from hdfs to elk server folder.

Run logstash job.

input {
file {
path => "/path/*.log"
type => "test"
start_position => beginning
sincedb_path => "/path/test"
}
}
filter {
if [type] == "test" {
grok {
match => {
"message" => "%{GREEDYDATA:test}"
}
}
}
}
output {
if [type] == "test" {
elasticsearch {
hosts => "ip:9200"
index => "test"
}
}
else {
stdout {
codec => rubydebug
}
}
}

List index documents.

curl -XGET 'ip:9200/test/_search?q=*&pretty&pretty'

"message" : "APBaaN APBaaN_BusinessPartner 07-05-2018 04:35:434 2018-05-07 06:41:04.844 941",
"@timestamp" : "2018-06-22T19:48:42.326Z",

Append 2nd row to hdfs log file.

hdfs dfs -appendToFile - /path/test.log

APBaaN APBaaN_BusinessPartner 07-06-2018 05:35:434 2018-06-07 07:41:04.844 941

List index documents.

curl -XGET 'ip:9200/test/_search?q=*&pretty&pretty'

"message" : "APBaaN APBaaN_BusinessPartner 07-05-2018 04:35:434 2018-05-07 06:41:04.844 941",
"@timestamp" : "2018-06-22T19:48:42.326Z",

"message" : "APBaaN APBaaN_BusinessPartner 07-05-2018 04:35:434 2018-05-07 06:41:04.844 941",
"@timestamp" : "2018-06-22T20:28:41.471Z",

"message" : "APBaaN APBaaN_BusinessPartner 07-06-2018 05:35:434 2018-06-07 07:41:04.844 941",
"@timestamp" : "2018-06-22T20:28:41.471Z",

1st and 2nd are duplicates.

Christian_Dahlqvist · June 21, 2018, 5:24am

Exactly how are you adding the new row to the file? If you are using an editor, this does not append, but generally results in a new file being created and renamed. Logstash will process this as a new file and send both lines. You should in this case see two entries in .sincedb.

databeata · June 21, 2018, 2:39pm

I am using an editor to add 2nd row to log file.

Christian_Dahlqvist · June 21, 2018, 2:43pm

That explains it then.

databeata · June 21, 2018, 3:57pm

if I don't use editor then how do I test adding 2nd row.

Christian_Dahlqvist · June 21, 2018, 4:01pm

If you are on Linux: echo "Test log entry" >> /path/test.log

databeata · June 23, 2018, 4:45am

There is no hdfs dfs -echo command.

There is hdfs dfs -appendToFile command.

I have included nifi job in my original post.

Christian_Dahlqvist · June 23, 2018, 5:59am

OK, you are trying to read from HDFS. That could indeed be a problem. I would recommend you read this note about the file input plugin and remote file systems. The same applies to Filebeat.

system · July 21, 2018, 5:59am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Duplicated logs from logstash after append logs Logstash	7	2171	July 6, 2017
Logstash ingest and export to elasticsearch files twice Logstash	16	740	March 16, 2022
How to avoid elasticsearch duplicate documents Logstash	6	1721	March 5, 2018
Duplicate Entries of Log data Elasticsearch	6	4814	September 29, 2017
Filebeat/Logstash and duplicate indexes Elasticsearch	2	488	October 20, 2018

Logstash is loading duplicates

Related topics