Hi All,
I use json
filter plugin to parse log entries in json format such as the below
{"actor_ip":"xxx","from":"Api::ActionsRunnerRegistration#POST","actor":"xxx","actor_id":2480,"org":"xxx","org_id":13,"action":"org.remove_self_hosted_runner","created_at":1669056434332,"data":{"user_agent":"GitHubActionsRunner-linux-x64/2.299.1 ClientId/3xxx RunnerId/229517 GroupId/2 CommitSHA/xxx","controller":"Api::ActionsRunnerRegistration","request_id":"d33b7a32-a424-46eb-82c2-232b30eff9f4","request_method":"post","request_category":"api","server_id":"10c1e833-0d38-4808-a0b2-7df5c87fac59","version":"v3","auth":"integration_installation","current_user":"xxx","integration_id":240,"installation_id":539,"_document_id":"SgNDkJsRSlmSjOqPkFv-2A","@timestamp":1669056434332,"operation_type":"remove","category_type":"Resource Management","business":"xxx","business_id":1,"actor_location":{"country_code":"US","country_name":"United States","location":{"lat":37.751,"lon":-97.822}}}}
The problem is with the data
field that actually contains embedded json and as result once I get this to a SIEM it creates there new fields such as data_user_agent
, data_created_at
, etc..
The problem is that this is a github audit log and quite a lot of different data
could be there which resulted in more than 500 data_something
fields in the table in my SIEM which exceeded the threshold.
Using the stdout
filter the below is a single entry that shows how the logs are parsed
{
"from" => "xxx",
"org_id" => 1386,
"created_at" => 1669008758000,
"tags" => [
[0] "json",
[1] "01fixed"
],
"data" => {
"permissions" => {
"metadata" => "read",
"contents" => "write"
},
"controller" => "Api::Integrations",
"aqueduct_job_id" => "xxx",
"parent_integration_installation_id" => 481,
"operation_type" => "modify",
"auth" => "xxx",
"token_last_eight" => "xxx",
"expires_at" => "xxx",
"repository_selection" => "selected",
"request_method" => "post",
"job" => "ScopedIntegrationInstallableExpirationExtensionJob",
"repository_ids" => [
[0] 8848
],
"integration_id" => 209,
"integration" => "tca-read-write-content",
"server_id" => "62079e0a-aa16-47a9-a830-0782295a1b69",
"request_id" => "80355149-ce81-485c-b440-771027bbb5f3",
"_document_id" => "s6pSScziiGIU5qPIeiCUDg",
"scoped_integration_installation_id" => 642448,
"version" => "v3",
"scoped_integration_installation" => "scoped_integration_installation-642448",
"actor_location" => {
"postal_code" => "xxx",
"city" => "xxx",
"country_code" => "xxx",
"location" => {
"lat" => 50.1162,
"lon" => 8.6365
},
"country_name" => "xxx",
"region" => "xxx",
"region_name" => "xxx"
},
"user_agent" => "python-requests/2.26.0",
"active_job_id" => "9f018dd3-80ed-4529-8464-58189832d9cd",
"business_id" => 1,
"business" => "xxx",
"category_type" => "Other",
"request_category" => "api",
"@timestamp" => 1669085347156
},
"org" => "xxx",
"action" => "scoped_integration_installation.extend_expires_at",
"actor_ip" => "xxx",
"EventTime" => "2022-11-22T01:49:07.000Z"
}
My question is can I somehow tell Logstash not to break down an embedded json but instead gives me one fields called data
that I can later convert to type dynamic in my cloud SIEM and parse it on search time, that way I wont exceed this 500 columns threslhold?