Fixing Mapping for Objects in Array (objects in arrays are not well supported)

I started ingesting audit logs from Google Cloud, and I'm getting "Objects in arrays are not well supported" notifications in Kibana for all arrays found in logs.

What would be the best solution to fix the issue?

  1. Changing data type from an object to nested?
    -- Kibana won't support this, not sure if this is even a good solution.
  2. A parent-child relationship?
    -- Slower performance.
  3. Denormalize data?
    -- Adding more documents to the current index.

And I choose #3 denormalize data, how should I go about doing it for the following document?

    [  
{
"protoPayload": {
  "@type": "type.googleapis.com/google.cloud.audit.AuditLog",
  "status": {},
  "authenticationInfo": {
    "principalEmail": "some_email@gmail.com"
  },
  "requestMetadata": {
    "callerIp": "127.1.1.1",
    "callerSuppliedUserAgent": "Windows 10",
    "requestAttributes": {
      "time": "2019-01-01T20:25:12.030662677Z",
      "auth": {}
    },
    "destinationAttributes": {}
  },
  "serviceName": "iam.googleapis.com",
  "methodName": "google.iam.admin.v1.ListServiceAccountKeys",
  "authorizationInfo": [
    {
      "resource": "projects/-/serviceAccounts/55558855585858855",
      "permission": "iam.serviceAccountKeys.list",
      "granted": true,
      "resourceAttributes": {
        "name": "projects/-/serviceAccounts/55558855585858855"
      }
    }
  ],
  "resourceName": "projects/-/serviceAccounts/55558855585858855",
  "request": {
    "@type": "type.googleapis.com/google.iam.admin.v1.ListServiceAccountKeysRequest",
    "name": "projects/test_environment",
    "key_types": [
      1
    ]
  },
  "response": {
    "@type": "type.googleapis.com/google.iam.admin.v1.ListServiceAccountKeysResponse"
  }
},
"insertId": "9890890890ddd",
"resource": {
  "type": "service_account",
  "labels": {
    "project_id": "devops",
    "unique_id": "55558855585858855",
    "email_id": "elastic_devops@email.me"
  }
},
"timestamp": "2019-01-01T20:25:11.920552117Z",
"severity": "INFO",
"logName": "projects/logging/project/devops",
"receiveTimestamp": "2019-01-01T20:25:13.057955601Z"
}
]

To remove the arrays from the data, you could do something like this (taking authorizationInfo an an example):

"authorizationInfo": [
    {
      "resource": "projects/-/serviceAccounts/55558855585858855",
      "permission": "iam.serviceAccountKeys.list",
      "granted": true,
      "resourceAttributes": {
        "name": "projects/-/serviceAccounts/55558855585858855"
      }
    },
   {
      "resource": "projects/-/serviceAccounts/sfasfasfasdfasdf",
      "permission": "iam.serviceAccountKeys.list",
      "granted": true,
      "resourceAttributes": {
        "name": "projects/-/serviceAccounts/sfasfasfasdfasdf"
      }
   }
  ],

becomes

"authorizationInfo.0": 
    {
      "resource": "projects/-/serviceAccounts/55558855585858855",
      "permission": "iam.serviceAccountKeys.list",
      "granted": true,
      "resourceAttributes": {
        "name": "projects/-/serviceAccounts/55558855585858855"
      }
    },
"authorizationInfo.1": 
    {
      "resource": "projects/-/serviceAccounts/sfasfasfasdfasdf",
      "permission": "iam.serviceAccountKeys.list",
      "granted": true,
      "resourceAttributes": {
        "name": "projects/-/serviceAccounts/sfasfasfasdfasdf"
      }
    },

@Larry_Gregory what's the best way of doing it? Writing a script or is there an easy way of splitting arrays in your data set?

Would Split filter plugin help here?
https://www.elastic.co/guide/en/logstash/current/plugins-filters-split.html

@Larry_Gregory I tried to use the Split filter plugin to flatten arrays, but I'm getting an error.

In the logs, I have a new tag: "_split_type_failure"

In the Logstash: "[2019-07-23T19:59:44,006][WARN ][logstash.filters.split ] Only String and Array types are splittable. field:[authorizationInfo] is of type = NilClass"

input { 
      google_pubsub {
      project_id => "testing"
      topic => "test_topic"
      subscription => "logstash-sub"
      include_metadata => true
      codec => "json"
}
# optional, but helpful to generate the ES index and test the plumbing
heartbeat {
    interval => 10
    type => "heartbeat"
  }
}
filter {
# don't modify logstash heartbeat events
if [type] != "heartbeat" {
    mutate {
        add_field => { "messageId" => "%{[@metadata][pubsub_message][messageId]}" }
    }
  }
}
filter {
 if [type] != "heartbeat" {
    split {
            field => "[authorizationInfo]"
    }
  }
}
output
{
stdout { codec => rubydebug }
elasticsearch
{
    hosts => ["https://URL:9243"]
    ssl => true
    user => "XXXX"
    password => "XXXX"
    index => "logstash-gcp-audit-%{+YYYY.MM.dd}"
  }
}

A super simple fix:

filter {
    split { field => "[protoPayload][authorizationInfo]" }
}

So let's say that I have two objects now in this array (your example above). How would this get extracted (split) using this method?

split { field => "[protoPayload][authorizationInfo]" }

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.