Best practice for reimporting files into Logstash?

Hi there,

I have trouble with re-importing existing logfiles (600mb each) into logstash or what's the best practice to do this.

Our setup is: Multiple hosts with filebeat daemons -> logstash -> elasticsearch

When something bad happens we need to reimport specific logfiles (they are log-rotated and bzip'ed so not under control of filebeat anymore) from all the hosts into logstash some days later (we already use a custom document_id, so duplicates are not an issue).

I can think of multiple ways to do this:

  1. reading the logfile and sending them via CURL to logstash (http-input)
  2. unzip the logfile and start filebeat cli with a prosector on the unzipped logfile
  3. copying the logfile to the logstash server unzipping it and and adding a file-input in logstash on the unzipped logfile

1:

The CURL solutions seems very imperformant to me for the amount of loglines. I know I can send multiple events per message, but we have some malformed loglines (3rd party application) from time to time and logstash will discard the complete bulk when one event is malformed, so we are probably losing 99 events when we could just lose 1 when sending them one by one.

2+3:

Seems to be perfect to me, but I wonder how to check if the whole file was processed and I'm able to remove it?

Thank you in advance :slight_smile:

Can somebody help me out with some advice on this topic please?

Hi,

Here's how I usually do it : fire up a couple logstash vms, nfs mount the logs I wanna import, bzcat them into an stdin input to logstash with two outputs, one to ES and the other to stdout so I can check when the process' over.

1 Like

Thank you Nicolas,

I just played around a bit more and it looks like my final solution will be the following:

(The following applies to each filebeat node, but thats what deployment piplines are for, right?)

  1. copy&unzip the file to reimport
  2. deploy a custom filebeat.yml (with close_eof)
  3. run filebeat.sh on command line with -once parameter and the custom filebeat.yml

This way we achieve the following:

  1. keep the existing logstash configuration
  2. performance is ok and no issues with high load on filebeat nodes
  3. logstash persistence buffer can be used

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.