How does filebeat traverse a path

hburnswell · August 31, 2017, 6:39pm

Hi All,

I'm curious about how filebeat behaves on a given path. My goal is to ingest IIS logs as filebeat -> logstash -> elasticsearch. The IIS log directory has log files going back to Jan 2016 (1 per day). I am using 'ignore_older' in my filebeat.yml file to ingest approximately the last 2 months of log information as:

ignore_older: 1344h

I was unable to use 'd' as a directive as it said it was 'unknown'. Does ignore_older just accept 'h' or 's'?
In Kibana, when I filter the index on log_timestamp, the oldest date I see is: 2017-08-13. Can I assume that the ingest of logs has not yet brought in any logs older than that date?
Is there a better/more efficient way to accomplish what I'm looking to do?

I am using the following in my filebeat.yml file:

paths:
  - C:\path\to\files\file_pattern*

input_type: log
ignore_older: 1344h

Is using the 'file_pattern*' causing any inefficiency? Should I just use \path\to\file\dir if all the files in the directory are specific to what I want to ingest?
How does filebeat traverse the designated path? Does it start with the oldest file timestamp and work it's way to the newer files (this would seem not to be the case given the above date filtering on log_timestamp)?
Does filebeat skip files that have an older time stamp than the designated ignore_older directive or does it still dow some sort of parsing?

I am happy to read any documentation I can be pointed to. I appreciate any guidance.

Thanks,

HB

andrewkroh · September 1, 2017, 8:18pm

Valid time units are ns, us, ms, s, m, h. Config file data types | Beats Platform Reference [8.11] | Elastic

Are you parsing your logs with Logstash in order to extract the date from the log line into log_timestamp? If so I'd say the answer is yes assuming you using a large time range in Kibana when viewing the data.

You are using ignore_older correctly.

The difference is probably negligible. But if all the files in the dir match file_pattern* I would drop the file name pattern from the glob leaving just 'C:\path\to\files'.

It asks the OS for a directory listing and starts harvestors in the order of the listing provided by the OS. All of the file havestors run concurrently.

When filebeat gets the directory listing it checks each file's last modified time. If that time is older than the ignore_older period then no harvestor is started for that file (so it never reads any file content).

system · September 30, 2017, 8:41pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Ignore_order not working in Filebeat Beats filebeat	4	443	June 30, 2020
Filebeat showing logs older than 24 hrs Beats	6	5414	July 4, 2016
Filebeat Ignore command Elasticsearch	3	103	April 2, 2024
Regarding ignore fieds Beats filebeat	10	250	December 1, 2022
Filebeat - difference in beat_timestamp and @timestamp Beats	5	2426	August 31, 2018

How does filebeat traverse a path

Related topics