Omit non-json lines in log files while still allowing json parsing

I've found that I'm only able to apply ['^{'] this pattern to include_lines when I omit json parsing from the yml. After reading the documentation, it implies the line_filtering is done after parsing, so if that's the case, how do I filter lines that I don't want included?

Examples below

This works

filebeat.inputs:
  - type: log
    enabled: true
    paths:
      - /applogs/*.log*
    include_lines: ['^{']
#    json.message_key: "level"
#    json.keys_under_root: true
#    json.overwrite_keys: true
processors:
  - add_fields:
      fields:
        kibanaspace: "specific-space"

This does not.

filebeat.inputs:
  - type: log
    enabled: true
    paths:
      - /applogs/*.log*
    include_lines: ['^{']
    json.message_key: "level"
    json.keys_under_root: true
    json.overwrite_keys: true
processors:
  - add_fields:
      fields:
        kibanaspace: "specific-space"

given an input log file of

{"level": "INFO", "workerId": "5cced6bb522d-10-139641205327616", "traceId": null, "message": "Metrics blah", "datetime": "20-01-09 00:17:10:446539"}
{"level": "INFO", "workerId": "5cced6bb522d-10-139641205327616", "traceId": null, "message": "Metrics blah", "datetime": "20-01-09 00:17:10:446539"}
This is a 3rd party log that I dont want captured
Another unstructured log I want filtered out

Yes, line filtering happens after parsing. What happens in this case is that it tries to parse the incoming log line as json, then if it succeeds, it looks for a key "level" in the json and requires that it's a string beginning with { (which is not what you want).

Right now I don't believe there's a way to really interleave json and non-json, however if you only want to keep the json entries as in your example, then you should be able to just enable json without any filtering options, as lines that can't be decoded as json are ignored by default. In that case you probably just need:

  json.keys_under_root: true
  json.overwrite_keys: true

For some reason, 99% of my third party logs that were not json were still being successfully parsed by filebeats and ingested by logstash/elastic. So instead, to filter out the lines that I want excluded post-filtering, I forced the existence of a field after parsing.

processors:
    - drop_event:
        when:
          not:
            has_fields: ["level"]

This worked in filtering out json parsed lines that did not contain the expected level field. I plan on adjusting my schema to conform to ECS 1.4. But this was a good first step in removing noise.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.