Parse single array json (Elastic Agent)

I'm trying to parse parsedmarc json files. These log files contain a single array with multiple records. I've taken the json and am testing with a single record, and am struggling to find the right combination of filebeat and processor configuration.

If I configure json interpreter it trips over the array, but if I use decode_json_fields I'm not getting the fields interpreted either.

The configuration I'm currently testing with:

#Processors:
decode_json_fields:
  fields: ["xml_schema", "policy_published", "records"]
  process_array: true
  max_depth: 4
  target: ""
  overwrite_keys: true
  add_error_key: true

#Custom configurations:
multiline:
  pattern: '^['
  negate: true
  match:  after

But whatever I do, I get either one document per line as clear text or the entire array.

The input json I'm testing with:

[
  {
    "xml_schema": "1.1",
    "report_metadata": {
      "org_name": "outlook.com",
      "org_email": "dmarcreport@microsoft.com",
      "org_extra_contact_info": null,
      "report_id": "89c0dced65764e6ea0a60a85671b4042",
      "begin_date": "2023-10-18 02:00:00",
      "end_date": "2023-10-19 02:00:00",
      "errors": []
    },
    "policy_published": {
      "domain": "example.com",
      "adkim": "r",
      "aspf": "r",
      "p": "none",
      "sp": "none",
      "pct": "100",
      "fo": "1"
    },
    "records": [
      {
        "source": {
          "ip_address": "188.172.137.13",
          "country": "IE",
          "reverse_dns": "outbyoip13.pod17.euw1.zdsys.com",
          "base_domain": "zdsys.com"
        },
        "count": 1,
        "alignment": {
          "spf": true,
          "dkim": false,
          "dmarc": true
        },
        "policy_evaluated": {
          "disposition": "none",
          "dkim": "fail",
          "spf": "pass",
          "policy_override_reasons": []
        },
        "identifiers": {
          "envelope_from": "example.com",
          "header_from": "example.com",
          "envelope_to": "hotmail.fr"
        },
        "auth_results": {
          "dkim": [
            {
              "domain": "zendesk.com",
              "selector": "zendesk1",
              "result": "pass"
            }
          ],
          "spf": [
            {
              "domain": "example.com",
              "scope": "mfrom",
              "result": "pass"
            }
          ]
        }
      }
    ]
  }
]

But what I get back is either of:

Or the message field contains the entire json array in a single line. Neither gives me an interpreted index of the messages from the log file.

The above json contains one dmarc report, parsedmarc writes reports to the file by adding them to the main array.

Found something that's working, except it creates an error event for the first line in the log file which is [.

Here's a sanitised version of the event I'd like to avoid or drop:

{
  "_index": ".ds-logs-dmarc-aggregate-2023.10.21-000001",
  "_id": "pyxwXIsBdvocVdgewzFF",
  "_version": 1,
  "_score": 0,
  "_source": {
    "input": {
      "type": "filestream"
    },
    "agent": {
      "type": "filebeat",
      "version": "8.10.2"
    },
    "@timestamp": "2023-10-23T12:09:35.383Z",
    "ecs": {
      "version": "8.0.0"
    },
    "log": {
      "file": {
        "path": "/var/log/parsedmarc/aggregate.json"
      },
      "offset": 0
    },
    "data_stream": {
      "namespace": "aggregate",
      "type": "logs",
      "dataset": "dmarc"
    },
    "elastic_agent": {
      "version": "8.10.2",
      "snapshot": false
    },
    "event": {
      "agent_id_status": "verified",
      "ingested": "2023-10-23T12:09:35Z",
      "dataset": "dmarc"
    },
    "error": {
      "message": "Error decoding JSON: unexpected EOF",
      "type": "json"
    },
    "message": "["
  },
  "fields": {
    "elastic_agent.version": [
      "8.10.2"
    ],
    "agent.type": [
      "filebeat"
    ],
    "host.os.type": [
      "linux"
    ],
    "data_stream.namespace": [
      "aggregate"
    ],
    "input.type": [
      "filestream"
    ],
    "log.offset": [
      0
    ],
    "message": [
      "["
    ],
    "data_stream.type": [
      "logs"
    ],
    "error.type": [
      "json"
    ],
    "error.message": [
      "Error decoding JSON: unexpected EOF"
    ],
    "data_stream.dataset": [
      "dmarc"
    ],
    "log.file.path": [
      "/var/log/parsedmarc/aggregate.json"
    ],
    "event.dataset": [
      "dmarc"
    ]
  }
}

The Elastic Agent, Custom Logs (Filebeat) config I've settled on for now:

#Processors:
#drop_fields:
#  fields: ['message']
#drop_event:
#  when:
#    equals:
#      message: "["

#Custom configurations:
type: filestream
id: aggregate
#exclude_lines: ['^[', '^]']
parsers:
- multiline:
    pattern: '^\ \ {'
    negate: true
    match:  after
- ndjson:
    keys_under_root: true
    add_error_key: true
    overwrite_keys: true

Exclude_lines doesn't work to drop the line matching ^[$ from the input, the last line with the closing bracket ends up being ignored, which is good. I've tried putting it under parsers, but that had no effect or dropped everything.

Processors drop_fields and drop_event have been no help either in avoiding this event being logged in Elasticsearch. Neither equals: message "[" or has_field: ['message]` appears to work.

Another issue is that the json file I have doesn't contain a newline after the closing ] bracket.

The resulting error is:

Error decoding JSON: json: cannot unmarshal string into Go value of type map[string]interface {}

So it turns out that for some cases ignoring errors is the only way to get things to behave. ignore_errors: true

The last thing to resolve now is to drop the error about the first line.

Found the solution to filter the error line:

exclude_lines: ['^\[']

the [ must be escaped and you can't match the end of the line with $.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.