I am new to Logstash and ELK as a whole. I am trying to send my airflow logs to Logstash. I am confused on how to configure my configuration file, especially because I have several (nested) log files.
My airflow is deployed on an AWS EC2 instance and my logs directory is something like this: /home/ubuntu/run/logs/scheduler/
The scheduler directory has a couple of dated folders. Using one of the folders as an example: /home/ubuntu/run/logs/scheduler/2022-08-31/
The dated folder has files such as
testing.py.log hello_world.py.log dag_file.py.log
While configuring my /etc/logstash/conf.d/
(based on the log path I shared above), how can I define my path to pick all the logs?
I know Logstash supports Glob Pattern. Does this mean that if my path is something like this:
path => ["/home/ubuntu/run/logs/*/*/*/*.log"]
It will crawl through the log folder and its subdirectories (as mentioned above) to get files with .log extensions?
This is what my /etc/logstash/conf.d/apache-01.conf
currently looks like:
input {
file {
path => "/home/ubuntu/run/logs/*/*/*/*.log"
start_position => "beginning"
codec -> "line"
}
}
filter {
grok {
match => { "path" => "/home/ubuntu/run/logs/(?<dag_id>.*?)/(?<task_id>.*?)/(?<execution_date>.*?)/(?<try_number>.*?).log$" }
}
mutate {
add_field => {
"log_id" => "%{[dag_id]}-%{[task_id]}-%{[execution_date]}-%{[try_number]}"
}
}
}
output{
elasticsearch {
hosts => ["localhost:9200"]
}
}