Parsing apache airflow logs

My main goal is to parse apache airflow logs into particular fields using logstash, feed it into elasticsearch and visualise them using kibana. There is no particular grok pattern available for airflow logs. I'm fairly new to elk stack. Need any help possible to parse important info from airflow logs.

i've been trying regex because i wasn't able to find a suitable grok filter to achieve the required result.
\*\s\w*\s\w*\s\w*\:\s\/[\w]*\/[\w]*\/[\w]*\/(?<dag_name>[\w]*)\/(?<task_id>[\w]*)\/(?<trigger_time>[\w\-\:\+]*)\/(?<file>[\w\.]*)| \[(?<start_time>[\d\-\s\:\,]*)\]\s\{(?<runner>[\w\.]*):(?<line_no>[\d]*)\}\s(?<level>[\w]*)\s\-\s(?<message>[\w\s\<\:\.\-\+]*)

I want to mainly parse the 1st,2nd and last line of the airflow log. The fields i want are:

  • dag_name
  • task_name
  • trigger_time
  • No_of_runs
  • start_time
  • end_time
  • message( task exited with code 0)

The first 4 fields will be derived from the first line. start_time from the 2nd line which is basically the time stamp from the second line. end_time and message will be derived from the last line of the log.

Please do not post pictures of text, just post the text.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.