I used the harvester_limit: 1 option along with close_inactive option to close a file after 1 min. This works for my use case. However when there are multiple older log files to be harvested, filebeat picks up the files in a random order instead of going for the oldest file first.
I'm not sure we provide explicit assurances around the order of processing.
It might be worth raising this as a feature request if it's important to you
Indeed we don't have any guarantees on the ordering. We discussed it in the past but didn't see the need for it as it would add scheduling complexity (as everyone wants a different order). Can you elaborate in more detail on why you need this ordering and have harvester_limit: 1. Understanding the use cases helps a lot.
We have a legacy system that produces a lot of logs which rollover frequently. Our setup sends logs to logstash from where we use the http output and send it to another legacy monitoring service. It has wireless connectivity issues which means we could see gaps during which no logs get shipped over to logstash. When the connection comes back we'd like to see the log files get sent out in the same order they were created. We can tolerate some amount of re-ordering in the legacy monitoring service but in the case of filebeat the order of scan is just totally random which doesn't help at all.
@ruflin I have another question - how do I ensure that while cleaning up older log files that their offset isn't still tracked in the registry. I assume that when close_inactive is reached, it gets removed from the registry?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.