Filebeat harvester picks up files in random order when scanning

I used the harvester_limit: 1 option along with close_inactive option to close a file after 1 min. This works for my use case. However when there are multiple older log files to be harvested, filebeat picks up the files in a random order instead of going for the oldest file first.

Has anyone faced this issue?

I'm not sure we provide explicit assurances around the order of processing.
It might be worth raising this as a feature request if it's important to you :slight_smile:

I have a solution in my fork. If I raise a pull request do you mind taking
a look?

Indeed we don't have any guarantees on the ordering. We discussed it in the past but didn't see the need for it as it would add scheduling complexity (as everyone wants a different order). Can you elaborate in more detail on why you need this ordering and have harvester_limit: 1. Understanding the use cases helps a lot.

Happy to also have a look at some code.

We have a legacy system that produces a lot of logs which rollover frequently. Our setup sends logs to logstash from where we use the http output and send it to another legacy monitoring service. It has wireless connectivity issues which means we could see gaps during which no logs get shipped over to logstash. When the connection comes back we'd like to see the log files get sent out in the same order they were created. We can tolerate some amount of re-ordering in the legacy monitoring service but in the case of filebeat the order of scan is just totally random which doesn't help at all.

pull request here: https://github.com/elastic/beats/pull/4374

1 Like

@ruflin I have another question - how do I ensure that while cleaning up older log files that their offset isn't still tracked in the registry. I assume that when close_inactive is reached, it gets removed from the registry?

Check the clean_* options like clean_inactive.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.