How hard is it to add a custom module?

I've been battling with this for days, I can't believe it can be this hard to process a custom log format.
My log lines mainly look like :
[2020-05-24 13:40:06,414] {{jobs.py:1725}} WARNING - No viable dags retrieved from /usr/local/airflow/dags/some_script.py
But they occasionally have python exceptions which go on to multiple lines.
I've created a custom module under /usr/share/filebeat/module/airflow

├── dags
│   ├── config
│   │   └── main.yml
│   ├── ingest
│   │   └── pipeline.json
│   └── manifest.yml
├── fields.go
├── _meta
│   ├── config.yml
│   ├── docs.asciidoc
│   └── fields.yml
└── module.yml

and enabled this under
/etc/filebeat/modules.d/airflow.yml

- module: airflow
  dags:
    enabled: true
    var.paths:
      - "/path/to/logs/*.log"

My main.yml tries to account for the multiline python errors:

type: log
paths:
{{ range $i, $path := .paths }}
 - {{$path}}
{{ end }}
exclude_files: [".gz$"]
multiline:
  pattern: '^\['
  negate: true
  match: after

And this is my pipeline.yml

{
  "description": "Pipeline for Airflow DAG logs",
  "processors": [{
    "grok": {
      "field": "message",
      "patterns":[
        "\\[%{TIMESTAMP:timestamp}\\] {{%{DATA:script_name}:%{NUMBER:dag_id}}} %{LOGLEVEL:level} - %{GREEDYMULTILINE:airflow_message}"
      ],
      "ignore_missing": true,
      "pattern_definitions": {
        "TIMESTAMP": "%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{HOUR}:%{MINUTE}:%{SECOND}"
      }
    }
  },
  {
    "date": {
      "field": "timestamp",
      "target_field": "@timestamp",
      "formats": ["yyyy-MM-dd HH:mm:ss"],
      "ignore_failure": true
    }
  }
  ],
  "on_failure" : [{
    "set" : {
      "field" : "error.message",
      "value" : "{{ _ingest.on_failure_message }}"
    }
  }]
}

When I restart filebeat with debugging I can see:

2020-05-24T13:40:10Z DBG  Publish event: {
  "@timestamp": "2020-05-24T13:40:10.680Z",
  "@metadata": {
    "beat": "filebeat",
    "type": "doc",
    "version": "6.0.1",
    "pipeline": "filebeat-6.0.1-airflow-dags-pipeline"
  },
  "source": "/usr/local/airflow/dags/some_script.py.log",
  "offset": 4355437,
  "message": "[2020-05-24 13:40:06,414] {{jobs.py:1725}} WARNING - No viable dags retrieved from /usr/local/airflow/dags/some_script.py",
  "fileset": {
    "module": "airflow",
    "name": "dags"
  },
  "prospector": {
    "type": "log"
  },
  "beat": {
    "name": "airflow-server-development-useast2-10-100-2-158",
    "hostname": "airflow-server-development-useast2-10-100-2-158",
    "version": "6.0.1"
  }
}

I don't know if that is supposed to indicate success but I do not see anything in Kibana.
If I remove the multiline config in the module config/main.yml in Kibana I only get entries with error.message saying: Provided Grok expressions do not match field value

Help appreciated as I am on the verge of dumping Filebeat.

I would say you are nearly there:
manifest.yml missing
airflow.yml ok
main.yml ok
pipeline.json not ok

Let me add the missing manifest.yml:

module_version: 1.0

var:
  - name: paths
    default:
      - /var/log/airflow/airflow.log*
    os.darwin:
      - /usr/local/airflow/var/log/airflow/airflow.log*
    os.windows:
      - c:/programdata/airflow/var/log/airflow/airflow.log*

ingest_pipeline: ingest/pipeline.json
input: config/main.yml 

You are getting this GROK error because GREEDYMULTILINE is missing:
Please correct it like this:

{
  "description": "Pipeline for Airflow DAG logs",
  "processors": [{
    "grok": {
      "field": "message",
      "patterns":[
        "\\[%{TIMESTAMP:timestamp}\\] {{%{DATA:script_name}:%{NUMBER:dag_id}}} %{LOGLEVEL:level} - %{GREEDYMULTILINE:airflow_message}"
      ],
      "ignore_missing": true,
      "pattern_definitions": {
        "TIMESTAMP": "%{YEAR}-%{MONTHNUM}-%{MONTHDAY} %{HOUR}:%{MINUTE}:%{SECOND}",
        "GREEDYMULTILINE" : "(.|\n)*"
      }
    }
  },
  {
    "date": {
      "field": "timestamp",
      "target_field": "@timestamp",
      "formats": ["yyyy-MM-dd HH:mm:ss"],
      "ignore_failure": true
    }
  }
  ],
  "on_failure" : [{
    "set" : {
      "field" : "error.message",
      "value" : "{{ _ingest.on_failure_message }}"
    }
  }]
}

I made it working with that one.

Thanks Andre,. I had the manifest but that GREEDYMULTILINE pattern definition got it working.
I've put it publicly on gitlab if anyone else needs it in future:


Cheers,
jonny

Ok, perfect. Nice to hear you got it working.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.