The files i needed to push to Elasticsearch for indexing are all "completed" files. To avoid duplicated document I'm trying the file_completed_action switch. Setting a document ID is not an option as my data files are not very well structured.
In my logstash config:
mode => "read" file_completed_action => "delete"
And I'm setting filebeat to pick up files. After the above settings, data files are deleted, but then obviously not all lines are feed into Elasticsearch. The count as reported in Elasticsearch index doesn't match with the total line counts for all data files, usually only about 50%-60% of the expected line count. And in fact this count in Elasticsearch differs every time i redo a test run after deleting the index.
Am I missing some configuration settings to make this "delete-after-harvest" function to work? Data files are definitely deleted before they are fully harvested.
Thanks in advance!