My CSV file has 26041 unique lines (I checked by manually generating sha1sum for each line). However, when using fingerprint as the document_id, elastic only stores 25919 documents. One document has the wrong _id:
_id: %{fingerprint} _type:logs _index:logstash-2015.07.14
This may have caused 122 records failed to load. Below is an excerpt of my config:
input { stdin {} }
filter {
csv { ... }
fingerprint { method => "SHA1" key => "RoamMonitor" }
}
output {
elasticsearch {
host => localhost
cluster => "rm_cluster_dev"
document_id => "%{fingerprint}"
}
}
If I disable the fingerprint and use the auto_id, all 26041 records were loaded.
Any ideas on how to debug/troubleshoot this?