I've been battling with this for days, I can't believe it can be this hard to process a custom log format.
My log lines mainly look like :
[2020-05-24 13:40:06,414] {{jobs.py:1725}} WARNING - No viable dags retrieved from /usr/local/airflow/dags/some_script.py
But they occasionally have python exceptions which go on to multiple lines.
I've created a custom module under /usr/share/filebeat/module/airflow
├── dags
│ ├── config
│ │ └── main.yml
│ ├── ingest
│ │ └── pipeline.json
│ └── manifest.yml
├── fields.go
├── _meta
│ ├── config.yml
│ ├── docs.asciidoc
│ └── fields.yml
└── module.yml
and enabled this under
/etc/filebeat/modules.d/airflow.yml
- module: airflow
dags:
enabled: true
var.paths:
- "/path/to/logs/*.log"
My main.yml tries to account for the multiline python errors:
type: log
paths:
{{ range $i, $path := .paths }}
- {{$path}}
{{ end }}
exclude_files: [".gz$"]
multiline:
pattern: '^\['
negate: true
match: after
And this is my pipeline.yml
{
"description": "Pipeline for Airflow DAG logs",
"processors": [{
"grok": {
"field": "message",
"patterns":[
"\\[%{TIMESTAMP:timestamp}\\] {{%{DATA:script_name}:%{NUMBER:dag_id}}} %{LOGLEVEL:level} - %{GREEDYMULTILINE:airflow_message}"
],
"ignore_missing": true,
"pattern_definitions": {
"TIMESTAMP": "%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{HOUR}:%{MINUTE}:%{SECOND}"
}
}
},
{
"date": {
"field": "timestamp",
"target_field": "@timestamp",
"formats": ["yyyy-MM-dd HH:mm:ss"],
"ignore_failure": true
}
}
],
"on_failure" : [{
"set" : {
"field" : "error.message",
"value" : "{{ _ingest.on_failure_message }}"
}
}]
}
When I restart filebeat with debugging I can see:
2020-05-24T13:40:10Z DBG Publish event: {
"@timestamp": "2020-05-24T13:40:10.680Z",
"@metadata": {
"beat": "filebeat",
"type": "doc",
"version": "6.0.1",
"pipeline": "filebeat-6.0.1-airflow-dags-pipeline"
},
"source": "/usr/local/airflow/dags/some_script.py.log",
"offset": 4355437,
"message": "[2020-05-24 13:40:06,414] {{jobs.py:1725}} WARNING - No viable dags retrieved from /usr/local/airflow/dags/some_script.py",
"fileset": {
"module": "airflow",
"name": "dags"
},
"prospector": {
"type": "log"
},
"beat": {
"name": "airflow-server-development-useast2-10-100-2-158",
"hostname": "airflow-server-development-useast2-10-100-2-158",
"version": "6.0.1"
}
}
I don't know if that is supposed to indicate success but I do not see anything in Kibana.
If I remove the multiline config in the module config/main.yml in Kibana I only get entries with error.message saying: Provided Grok expressions do not match field value
Help appreciated as I am on the verge of dumping Filebeat.