GCP module - Add cluster_name for GKE / k8s.io logs

Hello,

I am using the GCP module to collect GCP audit logs. I noticed that for GKE / k8s.io logs, we do not have the cluster_name in the ingested event.
The field is present in the original event collected in Pub/Sub, it is not kept by the FileBeat GCP module during parsing.

The correction is quite simple I think, it would require to add the code at this location beats/pipeline.js at master · elastic/beats · GitHub

{
from: "json.resource.labels.cluster_name",
to: "orchestrator.cluster.name",
type: "string"
}

I am using a field that will be available in ECS 1.10, maybe other fields in the orchestrator "space" should be populated as well by the GCP module when parsing GKE logs ?

I will open an issue on Github and create a pull request for that change in a few days, I'll wait for your inputs / comments on this.

Cheers
Antoine

Hi @aryon,

Thank you for bringing this up and also finding exactly where the enhancement needs to be made in the code — you rock!

I'd encourage you to create a PR for this enhancement as soon as you are ready.

Cheers,

Shaunak

Hi @shaunak,

Thanks for your reply. I dug around the code and the original event a bit more, I don't know if this forum is the right place for this but I have a few questions as it is the first time I am contributing to the code.
Here is an example of the original event, before processing by the module:

{
  "insertId":"94170ac4-6e82-4345-98ad-3c780222d19d",
  "labels":{
    "authorization.k8s.io/decision":"allow",
    "authorization.k8s.io/reason":""
  },
  "logName":"projects/redacted/logs/cloudaudit.googleapis.com%2Fdata_access",
  "operation":{
    "first":true,
    "id":"94170ac4-6e82-4345-98ad-3c780222d19d",
    "last":true,
    "producer":"k8s.io"
  },
  "protoPayload":{
    "@type":"type.googleapis.com/google.cloud.audit.AuditLog",
    "authenticationInfo":{
      "principalEmail":"redacted"
    },
    "authorizationInfo":[
      {
        "granted":true,
        "permission":"io.k8s.core.v1.nodes.list",
        "resource":"core/v1/nodes"
      }
    ],
    "methodName":"io.k8s.core.v1.nodes.list",
    "requestMetadata":{
      "callerIp":"redacted",
      "callerSuppliedUserAgent":"GoogleCloudConsole"
    },
    "resourceName":"core/v1/nodes",
    "serviceName":"k8s.io",
    "status":{}
  },
  "receiveTimestamp":"2021-04-23T14:47:31.94822935Z",
  "resource":{
    "labels":{
      "cluster_name":"redacted",
      "location":"redacted",
      "project_id":"redacted"
    },
    "type":"k8s_cluster"
  },
  "timestamp":"2021-04-23T14:47:07.535383Z"
}

I was thinking of renaming resource.labels.type to orchestrator.type but I saw in the Google documentation (MonitoredResource  |  Cloud Logging  |  Google Cloud) that resource.labels.type can have different values that should not go in that ECS field.
So I am thinking that a more important change is required, with a new function that would check the resource.labels.type value before changing it to orchestrator.type.
That new function would be added in the processor chain here: beats/pipeline.js at master · elastic/beats · GitHub
Does that make sense ?
Cheers,
Antoine

@aryon Do you mean resource.type? So becase the orchestrator.* spec is focused on container orchestration and the GCP logs are much more than just k8s, you'd have to conditionalize the processors with if: ctx.json.resource.type == "k8s_cluster". Then based on the spec, I would convert json.protoPayload.resourceName to orchestrator.resource.type and hardcode the orchestrator.type field to kubernetes.

Yes I meant resource.type. I edited my post after I saw an indentation error in the log sample but forgot to correct the rest of the post :roll_eyes:

you'd have to conditionalize the processors with if: ctx.json.resource.type == "k8s_cluster" .

Yes that's exactly what I was thinking about!

Then based on the spec, I would convert json.protoPayload.resourceName to orchestrator.resource.type and hardcode the orchestrator.type field to kubernetes .

I was thinking of using a dissect processor on json.protoPayload.resourceName to populate orchestrator.api_version, orchestrator.resource.type and orchestrator.resource.name as it seems these fields should contain "parsed" values based on the examples given here https://github.com/elastic/ecs/blob/master/rfcs/text/0012-orchestrator-field-set.md#kubernetes-audit-log,

so I would say that

orchestrator.resource.type: nodes or core/v1/nodes
orchestrator.api_version: v1
orchestrator.type: kubernetes

The example provided for orchestrator.resource.name here Orchestrator Fields | Elastic Common Schema (ECS) Reference [1.10] | Elastic, shows that it should be the actual name of the pod (or in this case node) which for this example doesn't exists since its listing all the nodes.

Hi guys,
I created this draft PR : [FileBeat] GCP module enhancement - Populate orchestrator.* fields for K8S logs by TonioRyo · Pull Request #25368 · elastic/beats · GitHub
I think it requires more work as I was not able to find a way to dissect the json.protoPayload.resourceName field to populate orchestrator.resource.type or orchestrator.resource.name correctly.
Here are some example of json.protoPayload.resourceName field values, if you have any idea on how to dissect them:

coordination.k8s.io/v1/namespaces/kube-system/leases/kube-scheduler
coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager
core/v1/namespaces/kube-system/pods/gadget-ktp98
core/v1/namespaces/kube-system/pods/gadget-6dd74
core/v1/namespaces/kube-system/configmaps/cluster-kubestore
core/v1/namespaces/kube-system/configmaps/clustermetrics
core/v1/namespaces/kube-system/endpoints/managed-certificate-controller
core/v1/namespaces/kube-system/endpoints/gcp-controller-manager
core/v1/namespaces/kube-system/configmaps/gke-common-webhook-lock
core/v1/namespaces/kube-system/endpoints/vpa-recommender
core/v1/namespaces/kube-system/configmaps/ingress-gce-lock
coordination.k8s.io/v1/namespaces/kube-system/leases/snapshot-controller-leader
core/v1/namespaces/kube-system/configmaps/ingress-uid
batch/v1beta1/cronjobs
batch/v1/jobs
core/v1/componentstatuses
core/v1/nodes
core/v1/namespaces

Also I think there will be an issue with the orchestrator.api_version for the following json.protoPayload.resourceName field values, I'll have to check that more thoroughly:

apis/admissionregistration.k8s.io/v1
apis/admissionregistration.k8s.io/v1beta1
apis/apiextensions.k8s.io/v1
apis/apiextensions.k8s.io/v1beta1
apis/apiregistration.k8s.io/v1
apis/apiregistration.k8s.io/v1beta1
apis/apps/v1
apis/authentication.k8s.io/v1
apis/authentication.k8s.io/v1beta1
apis/authorization.k8s.io/v1
apis/authorization.k8s.io/v1beta1
apis/autoscaling/v1

Cheers
Antoine

I made comments on the PR

Yes I saw them Alex, thank you ! I'll try to find some time tomorrow to address them.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.