Hi,
I am new with Elastic and Kibana.
Senario is as below:
We have a folder called "livedata"
In this(livedata) folder on every second or alternate new csv file is arraived from ftp.
I have created a config logstash where path of these csv is a defined
ex : path => a/b/c/*.csv
We want once the csv is loaded successfully it should move away to another folder.
How can we know log stash has loaded the file successfully?
Our issue is, the folder will continue to have files and in any case logstash has restarted, the CSV will loading start again, it will cause of duplicity. Yes we can create uniqe document_id to prevent it, but still load will increase on server as logstash will try to load and will reject due to constrain define as document_id or _id.
number of files in 24 hr = 80K files having 5 to 8 MB size each, similarly there are more than 13 sources we need to process and load.
Logstash tracks the position of the file that it is reading and write it on a file, called sincedb, if it crashes or is restarted it will start reading from the last position written, unless you specify that the sincedb path should be /dev/null.
Also, if you are not using any filters in logstash, just reading the file and shipping it to elasticsearch, you could use filebeat instead of logstash to read the logs and ship to logstash.
Filebeat also keeps track of what was already read and sent, but it also does not have any option to delete or move the file after it was read.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.