File "/path/to/logs/processed.log" contains two records of "/path/to/input.csv" when it should be only one!
Also, there are double amount of documents in ElasticSearch with duplicates from this file.
The sincedb should prevent this. You would have to provide more information, including matching sincedb entries and entries from the file completed log. Did you restart logstash?
There are two entries of "/path/to/input.csv" in "/path/to/logs/processed.log".
And there is one entry for "/path/to/input.csv" in /var/lib/logstash/plugins/inputs/file/.sincedb_ like this:
4278179 0 64769 758 1560832674.226105 "/path/to/input.csv"
It's certainly not "by design". The only reason to have a sincedb on disk is to persist the in-memory sincedb across restarts so that the file input does not re-read files when logstash is restarted.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.