Sending files in chronological order

Hi,

we have a running ELK stack, actually:
16 Mailappliances -> syslogserver -> to file -> filebeat -> logstash -> redis -> some magic script -> ES
Syslogserver + Filebeat is one machine.
Logstash, Redis and the magic scritp is another machine.
ES is also another machine(ClusteR).
We use this chain to process logs of our mail appliances, the logformat is very complicated and multilined.
The "magic script" relies on getting the logs in chronological order for each mailappliance.
As far as good, as long as no backlog happens between filebeat and logstash.
If logstash is to slow to process all incoming logs(or is down for some reason) filebeat creates a backlog and as soon as logstash is back up filebeat starts to send the backlog.
But it seems like the backlog is not something like "first in - first out".
It seems like Filebeat just got a bunch of files and starts to send the content of every file.
Let me make this a bit more clear:
Each mail appliance has its own logfile for the current day - lets say "2017-03-23_mail1.log", filebeat is sending the contents of the file to logstash.
Now suddenly logstash cant handle the load and filebeat starts to slow down.
now filebeat didnt finish sending the contents of the file and the syslogserver starts to create a new file for the next day, so there is a new file "2017-03-24_mail1.log" and filebeat also starts to send the content of this file.
Now, further down the chain our "magic script" can see different timestamps coming: one log got the timestamp "2017-03-24 00:00:10" and the next log got the timestamp "2017-03-23 22:00:22".
This is a big problem and we need to make sure that this script gets its logs in chronological order.

So, long story short:
Is there a way to tell filebeat to start sending the contents of the new file after finishing sending the contents of the old file?
for example, if the old file didnt change for 1 Minute -> start sending the new file

Or are we missing something? is there another solution?
Cheers
Mario

I'm afraid we currently don't have a way to guarantee the order of file sending. Perhaps the magic script can get the logs from redis sorted by the timestamp? Just an idea.

Hi tudor,

thanks for your reply.
We also thought about this, the problem is that the script doesnt know if there are logs between the two timestamps or not. And he doesnt know if he can continue or should wait more for "old logs to arrive".
the only one who can know this is filebeat, since he is reading the "sourcefiles".

Filebeat also doesn't know when a file is "finished", someone can always append to it after some time. It could use the fact that a new file appeared to signal that the old one is finished, but that would require it to understand the file name patterns, which it currently doesn't.

You could try to play with the harvester_limit option. If you set it to 1, it will guarantee that files are not read in parallel. As the docs say, you probably want to combine that with a small close_timeout or similar option. But getting the timings right is going to be tricky, I think.

Hi Tudor,

thank you for your help, much appreciated.
I think we will try to solve the problem with an external script, that should be more reliable.

Thanks
Mario

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.