Logstash failing to ingest multi-line json log file... I think

I am having trouble getting my data ingested by logstash... I think... but I'm just learning about how to do this.

Someone else stood up Cloud Custodian, and is dumping logs to an s3 bucket, and I am attempting to ingest them. The files are .log, but formatted multi-line json (pretty, not single line). I have read that this formatting makes things harder, but I am also being told that Cloud Custodian cannot be configured to do a single line file anyway. It is what it is. As an experiment, I tried to collapse the log file to a single line, and that did not help, so I am doing something else wrong anyway.

An example input file:

{
  "policy": {
    "name": "account-cloudtrail-enabled",
    "resource": "account",
    "description": "Checks to make sure CloudTrail is enabled on the account\nfor all regions.\n",
    "filters": [
      {
        "type": "check-cloudtrail",
        "global-events": false,
        "multi-region": false,
        "running": false,
        "file-digest": false
      }
    ]
  },
  "version": "0.9.13",
  "execution": {
    "id": "1ebc9860-6d1a-4e42-b809-0fad544479fe",
    "start": 1638815388.1077602,
    "end_time": 1638815388.935413,
    "duration": 0.8276526927947998
  },
  "config": {
    "region": "us-east-2",
    "regions": [
      "us-east-2"
    ],
    "cache": "~/.cache/cloud-custodian.cache",
    "profile": "CCAdmin",
    "account_id": "353563186465",
    "assume_role": null,
    "external_id": null,
    "log_group": null,
    "tracer": null,
    "metrics_enabled": null,
    "metrics": null,
    "output_dir": "s3://testcclog/custodian/",
    "cache_period": 15,
    "dryrun": false,
    "authorization_file": null,
    "subparser": "run",
    "config": null,
    "configs": [
      "./policies/root_account-compliance.yml"
    ],
    "policy_filters": [],
    "resource_types": [],
    "verbose": null,
    "quiet": null,
    "debug": false,
    "skip_validation": false,
    "command": "c7n.commands.run",
    "vars": null
  },
  "sys-stats": {},
  "api-stats": {
    "iam.ListAccountAliases": 1,
    "cloudtrail.DescribeTrails": 1
  },
  "metrics": [
    {
      "MetricName": "ResourceCount",
      "Timestamp": "2021-12-06T11:29:48.934903",
      "Value": 0,
      "Unit": "Count"
    },
    {
      "MetricName": "ResourceTime",
      "Timestamp": "2021-12-06T11:29:48.934920",
      "Value": 0.8265008926391602,
      "Unit": "Seconds"
    }
  ]
}

My conf file (altered to just input the one file for testing, and I did not include output here):

input {
    file {
            start_position => "beginning"
            path => "/etc/logstash/sample/cctest1.log"
            sincedb_path => "/dev/null"
    }
}

filter {
    json {
        source => "message"
        target => "cc-data"
    }
    mutate {
        remove_field => ["@timestamp", "@version", "host"]
    }
}

The result is "_jsonparsefailure":

        "_source" : {
          "path" : "/etc/logstash/sample/cctest1.log",
          "tags" : [
            "_jsonparsefailure"
          ],
          "message" : "  \"policy\": {"

and the logs show (in part):

[2021-12-14T17:39:47,760][INFO ][logstash.agent           ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2021-12-14T17:39:47,804][DEBUG][filewatch.sincedbcollection][main][2590825d4319bd0d59fd3a3624e3664e1d556c1fcec631e70e1e6d0b3e6a891c] open: reading from /dev/null
[2021-12-14T17:39:48,004][DEBUG][filewatch.tailmode.handlers.grow][main][2590825d4319bd0d59fd3a3624e3664e1d556c1fcec631e70e1e6d0b3e6a891c] controlled_read get chunk
[2021-12-14T17:39:48,019][DEBUG][logstash.inputs.file     ][main][2590825d4319bd0d59fd3a3624e3664e1d556c1fcec631e70e1e6d0b3e6a891c] Received line {:path=>"/etc/logstash/sample/cctest1.log", :text=>"{"}
[2021-12-14T17:39:48,050][DEBUG][logstash.codecs.plain    ][main][2590825d4319bd0d59fd3a3624e3664e1d556c1fcec631e70e1e6d0b3e6a891c] config LogStash::Codecs::Plain/@id = "plain_25008cc7-f861-40bc-8915-c838d4b5579a"

and

[2021-12-14T17:39:48,333][WARN ][logstash.filters.json    ][main][abcf413f55725bbbfa9f9906d732319ba13915c121bd7566c4ac806507f71b3e] Error parsing json {:source=>"message", :raw=>"    \"name\": \"account-cloudtrail-enabled\",", :exception=>#<LogStash::Json::ParserError: Unexpected character (':' (code 58)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: (byte[])"    "name": "account-cloudtrail-enabled","; line: 1, column: 12]>}
[2021-12-14T17:39:48,335][WARN ][logstash.filters.json    ][main][abcf413f55725bbbfa9f9906d732319ba13915c121bd7566c4ac806507f71b3e] Error parsing json {:source=>"message", :raw=>"    \"resource\": \"account\",", :exception=>#<LogStash::Json::ParserError: Unexpected character (':' (code 58)): expected a valid value (number, String, array, object, 'true', 'false' or 'null')
 at [Source: (byte[])"    "resource": "account","; line: 1, column: 16]>}
[2021-12-14T17:39:48,337][DEBUG][logstash.filters.json    ][main][abcf413f55725bbbfa9f9906d732319ba13915c121bd7566c4ac806507f71b3e] Running json filter {:event=>{"path"=>"/etc/logstash/sample/cctest1.log", "@version"=>"1", "@timestamp"=>2021-12-14T17:39:48.118Z, "message"=>"    \"description\": \"Checks to make sure CloudTrail is enabled on the account\\nfor all regions.\\n\",", "host"=>"ip-172-31-29-221.us-east-2.compute.internal"}}

Why would it be complaining about a ":"? That is valid json, no?

Can I configure this to read to whole multi-line json object?

If I comment out the json filter, I get the 'message' seeing each line of the log file as text rather than json:

        "_source" : {
          "path" : "/etc/logstash/sample/cctest1.log",
          "message" : "  \"policy\": {"

OK. I was able to figure out several things:

First, I am having two problems. That makes learning a new thing rather difficult. The default behavior of logstash it to treat each line as a separate entry. And... This part of the json is not parsing:

  "metrics": [
    {
      "MetricName": "ResourceCount",
      "Timestamp": "2021-12-06T11:29:48.934903",
      "Value": 0,
      "Unit": "Count"
    },
    {
      "MetricName": "ResourceTime",
      "Timestamp": "2021-12-06T11:29:48.934920",
      "Value": 0.8265008926391602,
      "Unit": "Seconds"
    }
  ]

I was able to solve the mutilline thing with the help of this page: Parsing array of json objects with logstash and injesting to elastic

I changed my input to:

file {
            start_position => "beginning"
            path => "/etc/logstash/sample/cctest1.log"
            sincedb_path => "/dev/null"
            codec => multiline {
                pattern => "^({|\[)\s*$"
                negate => true
                auto_flush_interval => 1
                multiline_tag => ""
                what => "previous"
            }
}

So a line that only contains a "{" or a "[" with possible white space after will trigger a new entry.

So it still will not parse the json. I figure I need to do a "split", but I am not understanding that well enough I guess.

The problem looks alot like the the one I referenced above. But "split { field => "someField" }" makes no sense to me.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.