Using Windows7
Using Logstash 5.4.1
input:
{
azurewadtable {
add_field => { "input_origin" => "MyServerLogs" }
storage_account_name => "MyDeploy"
storage_sas_token => "xxxxxxxxxx"
table_name => "WadLogsTable"
collection_start_time_utc => "2017-09-01"
}
}
Each record has a unique document_id, but after a crash all documents are duplicated
Output:
elasticsearch {
hosts => ["localhost:9200"]
document_id => "%{PartitionKey}-%{RowKey}"
}
I found the following in "Little Logstash Lessons: Handling Duplicates" documentation and gave it a try.
filter
{
fingerprint {
source => ["PartitionKey", "RowKey"]
concatenate_sources => true
method => "MURMUR3"
}
}
output
{
elasticsearch {
hosts => ["localhost:9200"]
document_id => "%{fingerprint}"
}
}
I now get a unique Id column which I assign to document_id
However whenever Logstash restarts all document are duplicated.
How can I prevent this from happening?