I am presently using logstash to ingest billions of records into elasticsearch.
I am using the file input plugin to read the files with the input mode set to read since all the data is already present.
The problem is that, each file is very large(close to 30GB) containing 200million records on average . This means that logstash will need much time to process each of these files.
I just noticed that logstash logs data into the since_db file only after completely reading a file, which is problematic given that a file may be read multiple times if logstash is restarted.
- Is my observation correct
- How do I configure logstash to write to the since_db file more often so that it continues from where it left off in case logstash restarts
Thanks in advance