I have trouble with re-importing existing logfiles (600mb each) into logstash or what's the best practice to do this.
Our setup is: Multiple hosts with filebeat daemons -> logstash -> elasticsearch
When something bad happens we need to reimport specific logfiles (they are log-rotated and bzip'ed so not under control of filebeat anymore) from all the hosts into logstash some days later (we already use a custom document_id, so duplicates are not an issue).
I can think of multiple ways to do this:
- reading the logfile and sending them via CURL to logstash (http-input)
- unzip the logfile and start filebeat cli with a prosector on the unzipped logfile
- copying the logfile to the logstash server unzipping it and and adding a file-input in logstash on the unzipped logfile
The CURL solutions seems very imperformant to me for the amount of loglines. I know I can send multiple events per message, but we have some malformed loglines (3rd party application) from time to time and logstash will discard the complete bulk when one event is malformed, so we are probably losing 99 events when we could just lose 1 when sending them one by one.
Seems to be perfect to me, but I wonder how to check if the whole file was processed and I'm able to remove it?
Thank you in advance