Parse Nested Airflow Logs with Logstash

I am new to Logstash and ELK as a whole. I am trying to send my airflow logs to Logstash. I am confused on how to configure my configuration file, especially because I have several (nested) log files.

My airflow is deployed on an AWS EC2 instance and my logs directory is something like this: /home/ubuntu/run/logs/scheduler/

The scheduler directory has a couple of dated folders. Using one of the folders as an example: /home/ubuntu/run/logs/scheduler/2022-08-31/

The dated folder has files such as

testing.py.log hello_world.py.log dag_file.py.log

While configuring my /etc/logstash/conf.d/(based on the log path I shared above), how can I define my path to pick all the logs?

I know Logstash supports Glob Pattern. Does this mean that if my path is something like this:

path => ["/home/ubuntu/run/logs/*/*/*/*.log"] It will crawl through the log folder and its subdirectories (as mentioned above) to get files with .log extensions?

This is what my /etc/logstash/conf.d/apache-01.conf currently looks like:

input {
        file {
                path => "/home/ubuntu/run/logs/*/*/*/*.log"
                start_position => "beginning"
                codec -> "line"
        }
}

filter {
  grok {
    match => { "path" => "/home/ubuntu/run/logs/(?<dag_id>.*?)/(?<task_id>.*?)/(?<execution_date>.*?)/(?<try_number>.*?).log$"  }
  }
  mutate {
      add_field => {
        "log_id" => "%{[dag_id]}-%{[task_id]}-%{[execution_date]}-%{[try_number]}"
      }
  }
}
output{
        elasticsearch {
                hosts => ["localhost:9200"]
        }
}

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.