How to split sometimes?

So I have to deal with ingesting logs that a team member is generating. I'm getting json logs, and I think I am very close to getting the data. (So close.) The error I am getting now looks like this:

[2022-01-04T18:02:35,797][WARN ][logstash.filters.split   ][main][22128956b6f75b779b56d873121a3c929244dd86fc4ec7540465d96c7b838712] Only String and Array types are splittable. field:[cc-data][metrics] is of type = NilClass

If I understand it right, I set up a filter to split on the above field:

split { field => "[cc-data][metrics]" }

The reason is that in *some of the json docs, the "metrics" is an array of objects:

"metrics": [
  {
    "MetricName": "ResourceCount",
    "Timestamp": "2021-12-06T11:29:48.934903",
    "Value": 0,
    "Unit": "Count"
  },
  {
    "MetricName": "ResourceTime",
    "Timestamp": "2021-12-06T11:29:48.934920",
    "Value": 0.8265008926391602,
    "Unit": "Seconds"
  }
]

But in others, it is not. First, I figured that I was getting an error for the json docs that don't have "metrics" at all, so I wrapped the above in an "if" block:

    if "[cc-data][metrics]" {
        split {
            field => "[cc-data][metrics]"
        }
    }

But I still get those errors.

Here are some raw logs (other than I prettied up the json a bit for readability):

[2022-01-04T23:48:20,405][DEBUG][logstash.filters.json    ][main][24e1801237c63a264f56bf4079e66115a5248dce5840ddf6281d9c3669ca1f30] 
Event after json filter {:event=>{
	"@timestamp"=>2022-01-04T23:48:20.300Z, 
	"message"=>"{
		"policy": {
			"name": "cis-check-config-is-enabled",
			"resource": "aws.account",
			"comment": "CIS Amazon Web Services Foundations v1.1.0 (2.5)",
			"filters": [
				{
					"type": "check-config",
					"all-resources": true,
					"global-resources": true,
					"running": true
				}
			],
			"mode": {
				"schedule": "rate(24 hours)",
				"type": "periodic",
				"role": "arn:aws:iam::353563186465:role/CCLam",
				"tags": {
					"custodian-info": "mode=periodic:version=0.9.13"
				}
			},
			"actions": [
				{
					"type": "notify",
					"action_desc": "Enable AWS Config Service to meet benchmarks.",
					"to": [
						"slack"
					],
					"transport": {
						"type": "sqs",
						"queue": "c7nMessageQueue"
					},
					"violation_desc": "AWS Config Service must be enabled in all regions. CIS Amazon Web Services Foundations v1.1.0 (2.5)"
				}
			]
		},
		"version": "0.9.13",
		"execution": {
			"id": "7f75ad4c-1d13-4be4-a60f-11d6d0963f07",
			"start": 1637885511.6461983,
			"end_time": 1637885512.104058,
			"duration": 0.45785975456237793
		},
		"config": {
			"region": "us-east-2",
			"regions": [],
			"cache": "",
			"profile": null,
			"account_id": "353563186465",
			"assume_role": null,
			"external_id": null,
			"log_group": null,
			"tracer": "default",
			"metrics_enabled": false,
			"metrics": null,
			"output_dir": "s3://testcclog/custodian/",
			"cache_period": 0,
			"dryrun": false,
			"authorization_file": null
		},
		"sys-stats": {},
		"api-stats": {
			"iam.ListAccountAliases": 2,
			"config.DescribeDeliveryChannels": 1,
			"config.DescribeConfigurationRecorders": 1,
			"sqs.SendMessage": 1
		},
		"metrics": [
			{
				"MetricName": "ResourceCount",
				"Timestamp": "2021-11-26T00:11:51.829765",
				"Value": 1,
				"Unit": "Count"
			},
			{
				"MetricName": "ResourceTime",
				"Timestamp": "2021-11-26T00:11:51.829781",
				"Value": 0.1658191680908203,
				"Unit": "Seconds"
			},
			{
				"MetricName": "PolicyException",
				"Timestamp": "2021-11-26T00:11:52.104033",
				"Value": 1,
				"Unit": "Count"
			}
		]
	}", 
	"@version"=>"1", 
	"cc-data"=>{
		"metrics"=>[
			{
				"MetricName"=>"ResourceCount", 
				"Timestamp"=>"2021-11-26T00:11:51.829765", 
				"Value"=>1, 
				"Unit"=>"Count"
			}, 
			{
				"MetricName"=>"ResourceTime", 
				"Timestamp"=>"2021-11-26T00:11:51.829781", 
				"Value"=>0.1658191680908203e0, 
				"Unit"=>"Seconds"
			}, 
			{
				"MetricName"=>"PolicyException", 
				"Timestamp"=>"2021-11-26T00:11:52.104033", 
				"Value"=>1, 
				"Unit"=>"Count"
			}
		], 
		"config"=>{
			"metrics_enabled"=>false, 
			"region"=>"us-east-2", 
			"output_dir"=>"s3://testcclog/custodian/", 
			"metrics"=>nil, 
			"authorization_file"=>nil, 
			"tracer"=>"default", 
			"log_group"=>nil, 
			"cache_period"=>0, 
			"regions"=>[], 
			"assume_role"=>nil, 
			"account_id"=>"353563186465", 
			"dryrun"=>false, 
			"profile"=>nil, 
			"cache"=>"", 
			"external_id"=>nil
		}, 
		"version"=>"0.9.13", 
		"sys-stats"=>{}, 
		"execution"=>{
			"start"=>0.16378855116461983e10, 
			"duration"=>0.45785975456237793e0, 
			"end_time"=>0.1637885512104058e10, 
			"id"=>"7f75ad4c-1d13-4be4-a60f-11d6d0963f07"
		}, 
		"api-stats"=>{
			"iam.ListAccountAliases"=>2, 
			"config.DescribeConfigurationRecorders"=>1, 
			"sqs.SendMessage"=>1, 
			"config.DescribeDeliveryChannels"=>1
		}, 
		"policy"=>{
			"comment"=>"CIS Amazon Web Services Foundations v1.1.0 (2.5)", 
			"name"=>"cis-check-config-is-enabled", 
			"mode"=>{
				"type"=>"periodic", 
				"schedule"=>"rate(24 hours)", 
				"tags"=>{
					"custodian-info"=>"mode=periodic:version=0.9.13"
				}, 
				"role"=>"arn:aws:iam::353563186465:role/CCLam"
			}, 
			"filters"=>[
				{
					"all-resources"=>true, 
					"type"=>"check-config", 
					"global-resources"=>true, 
					"running"=>true
				}
			], 
			"actions"=>[
				{
					"to"=>["slack"], 
					"type"=>"notify", 
					"action_desc"=>"Enable AWS Config Service to meet benchmarks.", 
					"transport"=>{
						"queue"=>"c7nMessageQueue", 
						"type"=>"sqs"
					}, 
					"violation_desc"=>"AWS Config Service must be enabled in all regions. CIS Amazon Web Services Foundations v1.1.0 (2.5)"
				}
			], 
			"resource"=>"aws.account"
		}
	}
}
}

[2022-01-04T23:48:20,406][WARN ][logstash.filters.split   ][main][4c01b22e2bbae6a111d8abe472b6671dc92d7d227150064b6128069de5e12f70] 
Only String and Array types are splittable. field:[cc-data][metrics] is of type = NilClass

Try this. You don't wrap field names in double quotes in a conditional statement.

if [cc-data][metrics] {
 split {
  field => "[cc-data][metrics]"
 }
}

If it is just a question of the field being an array or nil then the answer from @aaron-nimocks is fine. If you also have to handle the case where the field is sometimes a hash or something else that is not an array then test for the first entry in the array

if [cc-data][metrics][0]

I got in the habit of quoting the field names when I got some other error... I dont remember it exactly... but I thought that resolved that other error. While that is a good tip (and I will fuss with that a bit more to see what else is borked in my config due to this), I think that adding the "[0]" is the correct answer to this particular question.

Thank you both for your responces.

At the risk of "scope creep", I will say that I am still getting an error, but a different one. In the case of the data:

"cc-data"=>[
	{
		"CidrBlockAssociationSet"=>[
			{
				"CidrBlock"=>"172.31.0.0/16"
			}
		],
		"CidrBlock"=>"172.31.0.0/16", 
		"State"=>"available"
	}
]

I get the message:

object mapping for [cc-data.State] tried to parse field [State] as object, but found a concrete value

Filter attempt one:

    if [cc-data][State][0] {
        split {
            field => "[cc-data][State]"
        }
    }

Filter attempt two:

    if [cc-data][State][0] and [cc-data][State][1] {
        split {
            field => "[cc-data][State]"
        }
    }

See this answer. Once you have indexed an document in which [cc-data][State] is an object any event that in which [cc-data][State] is text will be rejected.

I think I understand, and an attempt was made. But I still get the error:

Could not index event to Elasticsearch. {
...
			"cc-data"=>[
				{
					"CidrBlockAssociationSet"=>[
						{
							"CidrBlockState"=>{
								"State"=>"associated"
							}, 
							"AssociationId"=>"vpc-cidr-assoc-09d9abe4524d5afdc", 
							"CidrBlock"=>"172.31.0.0/16"
						}
					], 
					"State"=>"available", 
					"IsDefault"=>true
				}
			], 
...
			"error"=>{
				"type"=>"mapper_parsing_exception", 
				"reason"=>"object mapping for [cc-data.State] tried to parse field [State] as object, but found a concrete value"
			}
		}
	}
}

I've made a few attempts...

    if [cc-data][State][0] {
        split { field => "[cc-data][States]" }
    } else {
        if [cc-data][State] {
            mutate { rename => { "[cc-data][State]" => "[cc-data][Stat]" } }
        }
    }

and

    if [cc-data][State][0] and [cc-data][State][1] {
        split { field => "[cc-data][States]" }
    } else {
        if [cc-data][State] {
            mutate { rename => { "[cc-data][State]" => "[cc-data][Stat]" } }
        }
    }

and

    if [cc-data][State][0] {
        mutate { rename => { "[cc-data][State]" => "[cc-data][States]" } }
        split { field => "[cc-data][States]" }
    }

and others less interesting. In my OP, I referenced a different field: "[cc-data][metrics]", becuase they were returning the same error. A similar "if" block seems to have sorted that one, and some edge case is popping up here?

Looking at the fields that made it into ES, I see:

cc-data.State.Code
cc-data.State.Name
cc-data.State.Name.keyword

That should probably be [cc-data][State], not [cc-data][States].

If the first document indexed has [cc-data][State] as an array, then subsequent documents where that field has been split will get that mapping exception. It will not start working until you roll over to a new index.

To avoid that, I just delete the index between runs. :slight_smile:

If rerun it configured like this, and still get the same thing:

    if [cc-data][State][0] {
        split { field => "[cc-data][State]" }
    } else {
        if [cc-data][State] {
            mutate { rename => { "[cc-data][State]" => "[cc-data][Stat]" } }
        }
    }

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.