Best practice for reimporting files into Logstash?

mw_jko · May 19, 2017, 2:06pm

Hi there,

I have trouble with re-importing existing logfiles (600mb each) into logstash or what's the best practice to do this.

Our setup is: Multiple hosts with filebeat daemons -> logstash -> elasticsearch

When something bad happens we need to reimport specific logfiles (they are log-rotated and bzip'ed so not under control of filebeat anymore) from all the hosts into logstash some days later (we already use a custom document_id, so duplicates are not an issue).

I can think of multiple ways to do this:

reading the logfile and sending them via CURL to logstash (http-input)
unzip the logfile and start filebeat cli with a prosector on the unzipped logfile
copying the logfile to the logstash server unzipping it and and adding a file-input in logstash on the unzipped logfile

1:

The CURL solutions seems very imperformant to me for the amount of loglines. I know I can send multiple events per message, but we have some malformed loglines (3rd party application) from time to time and logstash will discard the complete bulk when one event is malformed, so we are probably losing 99 events when we could just lose 1 when sending them one by one.

2+3:

Seems to be perfect to me, but I wonder how to check if the whole file was processed and I'm able to remove it?

Thank you in advance

mw_jko · May 22, 2017, 8:15am

Can somebody help me out with some advice on this topic please?

n.maire · May 22, 2017, 1:04pm

Hi,

Here's how I usually do it : fire up a couple logstash vms, nfs mount the logs I wanna import, bzcat them into an stdin input to logstash with two outputs, one to ES and the other to stdout so I can check when the process' over.

mw_jko · May 23, 2017, 12:01pm

Thank you Nicolas,

I just played around a bit more and it looks like my final solution will be the following:

(The following applies to each filebeat node, but thats what deployment piplines are for, right?)

copy&unzip the file to reimport
deploy a custom filebeat.yml (with close_eof)
run filebeat.sh on command line with -once parameter and the custom filebeat.yml

This way we achieve the following:

keep the existing logstash configuration
performance is ok and no issues with high load on filebeat nodes
logstash persistence buffer can be used

system · June 20, 2017, 12:02pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Bash to import data - what could I be doing better Elasticsearch	5	540	July 6, 2017
Use Case Question - Zipped logs from appliance at customer site Logstash	4	1020	July 6, 2017
Logstash is re-importing entire logfile instead of just new records Logstash	6	1484	July 6, 2017
How to ingest logstash logs ...so we can see them from elasticsearch Logstash	2	1089	July 6, 2017
Use filebeat file output as logstash input Logstash	6	759	July 30, 2019

Best practice for reimporting files into Logstash?

Related topics