Duplicate documents after Logstash restart

Absinthe · September 28, 2017, 7:44pm

Using Windows7
Using Logstash 5.4.1

input:
{
azurewadtable {
add_field => { "input_origin" => "MyServerLogs" }
storage_account_name => "MyDeploy"
storage_sas_token => "xxxxxxxxxx"
table_name => "WadLogsTable"
collection_start_time_utc => "2017-09-01"
}
}
Each record has a unique document_id, but after a crash all documents are duplicated

Output:
elasticsearch {
hosts => ["localhost:9200"]
document_id => "%{PartitionKey}-%{RowKey}"
}

I found the following in "Little Logstash Lessons: Handling Duplicates" documentation and gave it a try.

filter
{
fingerprint {
source => ["PartitionKey", "RowKey"]
concatenate_sources => true
method => "MURMUR3"
}
}

output
{
elasticsearch {
hosts => ["localhost:9200"]
document_id => "%{fingerprint}"
}
}

I now get a unique Id column which I assign to document_id

However whenever Logstash restarts all document are duplicated.

How can I prevent this from happening?

system · October 26, 2017, 7:44pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Logstash produces duplicates Logstash	3	1187	July 6, 2017
Logstash adding duplicate rows for every run Logstash	11	14776	July 6, 2017
Duplicate logs Elasticsearch	14	6888	July 10, 2018
How to avoid data duplication in elasticsearch when data send from logstash? Elasticsearch	5	534	October 17, 2018
Duplicate items in elasticsearch Logstash	3	804	July 6, 2017

Duplicate documents after Logstash restart

Related topics