Harvesting specified number of most recent files

Hi,
I am wondering how to harvest only specified number of most recent files ?
Or for example is there any way to not harvest files that have been renamed (but they always need to be read to the end so I think close_renamed wont work).
I have so huge amount of log files that I need to limit harvested files. Every hour there is created new file in which logs are being stored (after - it is renamed). I need this new data in ES but can't miss logs that are a bit older. I tried ignore_older but I don't have warranty that nothing is missed. File older than ignore_older won't be ever read, am I right ?

Hi @kskubala,

I don't completely understand your problem. Even if you create a file every hour filebeat should be able to harvest all of them.

When ignore_older option is enabled filebeat ignores files modified before the specified timespan.

I will explain it step by step.
I have many files that needs to be harvested.
I want them to be put in elastic in real time. So I want all the most recent logs that are written to files to be almost immediately stored in ES.
When I have many files, scanning files and harvesting many of them is causing that I don't have recent logs in ES.
As a solution I have tried to set ignore_older, but I am worried that some logs will be missed (some logs are written to file once per hour). Setting higher value of ignore_older is causing same thing - no real time data.

@kskubala could you share the configuration you are using for these files?

Unfortunately I can't. Anyway there isn't anything according to harvesting settings except for

ignore_older: 5m

Take a look also to the close_inactive option. If ignore_older is set, it has to be set to a greater value than close_inactive that defaults to 5m.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.