Duplicate documents after Logstash restart


(Mark) #1

Using Windows7
Using Logstash 5.4.1

input:
{
azurewadtable {
add_field => { "input_origin" => "MyServerLogs" }
storage_account_name => "MyDeploy"
storage_sas_token => "xxxxxxxxxx"
table_name => "WadLogsTable"
collection_start_time_utc => "2017-09-01"
}
}
Each record has a unique document_id, but after a crash all documents are duplicated

Output:
elasticsearch {
hosts => ["localhost:9200"]
document_id => "%{PartitionKey}-%{RowKey}"
}

I found the following in "Little Logstash Lessons: Handling Duplicates" documentation and gave it a try.

filter
{
fingerprint {
source => ["PartitionKey", "RowKey"]
concatenate_sources => true
method => "MURMUR3"
}
}

output
{
elasticsearch {
hosts => ["localhost:9200"]
document_id => "%{fingerprint}"
}
}

I now get a unique Id column which I assign to document_id

However whenever Logstash restarts all document are duplicated.

How can I prevent this from happening?


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.