Not breaking down Embeded json field and instead keep it in a single field

Hi All,

I use json filter plugin to parse log entries in json format such as the below

{"actor_ip":"xxx","from":"Api::ActionsRunnerRegistration#POST","actor":"xxx","actor_id":2480,"org":"xxx","org_id":13,"action":"org.remove_self_hosted_runner","created_at":1669056434332,"data":{"user_agent":"GitHubActionsRunner-linux-x64/2.299.1 ClientId/3xxx RunnerId/229517 GroupId/2 CommitSHA/xxx","controller":"Api::ActionsRunnerRegistration","request_id":"d33b7a32-a424-46eb-82c2-232b30eff9f4","request_method":"post","request_category":"api","server_id":"10c1e833-0d38-4808-a0b2-7df5c87fac59","version":"v3","auth":"integration_installation","current_user":"xxx","integration_id":240,"installation_id":539,"_document_id":"SgNDkJsRSlmSjOqPkFv-2A","@timestamp":1669056434332,"operation_type":"remove","category_type":"Resource Management","business":"xxx","business_id":1,"actor_location":{"country_code":"US","country_name":"United States","location":{"lat":37.751,"lon":-97.822}}}}

The problem is with the data field that actually contains embedded json and as result once I get this to a SIEM it creates there new fields such as data_user_agent, data_created_at, etc..

The problem is that this is a github audit log and quite a lot of different data could be there which resulted in more than 500 data_something fields in the table in my SIEM which exceeded the threshold.

Using the stdout filter the below is a single entry that shows how the logs are parsed

          "from" => "xxx",
        "org_id" => 1386,
    "created_at" => 1669008758000,
          "tags" => [
        [0] "json",
        [1] "01fixed"
          "data" => {
                               "permissions" => {
            "metadata" => "read",
            "contents" => "write"
                                "controller" => "Api::Integrations",
                           "aqueduct_job_id" => "xxx",
        "parent_integration_installation_id" => 481,
                            "operation_type" => "modify",
                                      "auth" => "xxx",
                          "token_last_eight" => "xxx",
                                "expires_at" => "xxx",
                      "repository_selection" => "selected",
                            "request_method" => "post",
                                       "job" => "ScopedIntegrationInstallableExpirationExtensionJob",
                            "repository_ids" => [
            [0] 8848
                            "integration_id" => 209,
                               "integration" => "tca-read-write-content",
                                 "server_id" => "62079e0a-aa16-47a9-a830-0782295a1b69",
                                "request_id" => "80355149-ce81-485c-b440-771027bbb5f3",
                              "_document_id" => "s6pSScziiGIU5qPIeiCUDg",
        "scoped_integration_installation_id" => 642448,
                                   "version" => "v3",
           "scoped_integration_installation" => "scoped_integration_installation-642448",
                            "actor_location" => {
             "postal_code" => "xxx",
                    "city" => "xxx",
            "country_code" => "xxx",
                "location" => {
                "lat" => 50.1162,
                "lon" => 8.6365
            "country_name" => "xxx",
                  "region" => "xxx",
             "region_name" => "xxx"
                                "user_agent" => "python-requests/2.26.0",
                             "active_job_id" => "9f018dd3-80ed-4529-8464-58189832d9cd",
                               "business_id" => 1,
                                  "business" => "xxx",
                             "category_type" => "Other",
                          "request_category" => "api",
                                "@timestamp" => 1669085347156
           "org" => "xxx",
        "action" => "scoped_integration_installation.extend_expires_at",
      "actor_ip" => "xxx",
     "EventTime" => "2022-11-22T01:49:07.000Z"

My question is can I somehow tell Logstash not to break down an embedded json but instead gives me one fields called data that I can later convert to type dynamic in my cloud SIEM and parse it on search time, that way I wont exceed this 500 columns threslhold?

Since the data field is a key in the source json, logstash will always parse it.

What you can do is add the content of the data field to another field using mutate and then remove the data field.

    mutate {
        add_field => {
            "[fieldName]" => "%{[data]}"

After that, the content of the fieldName will be:

{"user_agent":"GitHubActionsRunner-linux-x64/2.299.1 ClientId/3xxx RunnerId/229517 GroupId/2 CommitSHA/xxx","controller":"Api::ActionsRunnerRegistration","request_id":"d33b7a32-a424-46eb-82c2-232b30eff9f4","request_method":"post","request_category":"api","server_id":"10c1e833-0d38-4808-a0b2-7df5c87fac59","version":"v3","auth":"integration_installation","current_user":"xxx","integration_id":240,"installation_id":539,"_document_id":"SgNDkJsRSlmSjOqPkFv-2A","@timestamp":1669056434332,"operation_type":"remove","category_type":"Resource Management","business":"xxx","business_id":1,"actor_location":{"country_code":"US","country_name":"United States","location":{"lat":37.751,"lon":-97.822}}}
1 Like

Many thanks @leandrojmp, exactly what I needed :slight_smile:

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.