Parse the past - How to manage my log files


I have installed a ELK stack in production to get stats from our cdn logs. I have about 6 month of past logs to parse, with between 50 and 200 millions of event per day.

Logs are stored in gz. Currently, I have a script that uncompresses the logs (40 per batch) in a directory watched by Filebeat.
But I have no way to know when the 40 files have been parsed to start an other batch, so I'm doing by hand... Any idea how my script could know when filebeat as finished?
I thought I could based on the registry, but there is no info about the fact that files have been read to the end.

I would recommend you to use the registry. You can use the offset in the registry and compare it with the file size. If size == offset, filebeat is finished with reading.

Otherwise have a look at the -once option, but that would mean to start filebeat every time.

Oh I didn't got that offset==file size, nice! I finally managed it by logging filebeat activity in a file, and parsing it to get the 'File is inactive: path_to_the/file'.
So now I have a pretty nice log file manager script croned every 5 min. I uncompress 40 logs, move it to filebeat watch folder, and archive them when finished.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.