How does filebeat traverse a path


#1

Hi All,

I'm curious about how filebeat behaves on a given path. My goal is to ingest IIS logs as filebeat -> logstash -> elasticsearch. The IIS log directory has log files going back to Jan 2016 (1 per day). I am using 'ignore_older' in my filebeat.yml file to ingest approximately the last 2 months of log information as:

ignore_older: 1344h

  1. I was unable to use 'd' as a directive as it said it was 'unknown'. Does ignore_older just accept 'h' or 's'?
  2. In Kibana, when I filter the index on log_timestamp, the oldest date I see is: 2017-08-13. Can I assume that the ingest of logs has not yet brought in any logs older than that date?
  3. Is there a better/more efficient way to accomplish what I'm looking to do?

I am using the following in my filebeat.yml file:

paths:
  - C:\path\to\files\file_pattern*

input_type: log
ignore_older: 1344h
  1. Is using the 'file_pattern*' causing any inefficiency? Should I just use \path\to\file\dir if all the files in the directory are specific to what I want to ingest?
  2. How does filebeat traverse the designated path? Does it start with the oldest file timestamp and work it's way to the newer files (this would seem not to be the case given the above date filtering on log_timestamp)?
  3. Does filebeat skip files that have an older time stamp than the designated ignore_older directive or does it still dow some sort of parsing?

I am happy to read any documentation I can be pointed to. I appreciate any guidance.

Thanks,

HB


(Andrew Kroh) #2

Valid time units are ns, us, ms, s, m, h. https://www.elastic.co/guide/en/beats/libbeat/current/config-file-format-type.html#_duration

Are you parsing your logs with Logstash in order to extract the date from the log line into log_timestamp? If so I'd say the answer is yes assuming you using a large time range in Kibana when viewing the data.

You are using ignore_older correctly.

The difference is probably negligible. But if all the files in the dir match file_pattern* I would drop the file name pattern from the glob leaving just 'C:\path\to\files'.

It asks the OS for a directory listing and starts harvestors in the order of the listing provided by the OS. All of the file havestors run concurrently.

When filebeat gets the directory listing it checks each file's last modified time. If that time is older than the ignore_older period then no harvestor is started for that file (so it never reads any file content).


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.