GCP module - Add cluster_name for GKE / k8s.io logs

aryon · April 26, 2021, 3:57pm

Hello,

I am using the GCP module to collect GCP audit logs. I noticed that for GKE / k8s.io logs, we do not have the cluster_name in the ingested event.
The field is present in the original event collected in Pub/Sub, it is not kept by the FileBeat GCP module during parsing.

The correction is quite simple I think, it would require to add the code at this location beats/pipeline.js at master · elastic/beats · GitHub

{
from: "json.resource.labels.cluster_name",
to: "orchestrator.cluster.name",
type: "string"
}

I am using a field that will be available in ECS 1.10, maybe other fields in the orchestrator "space" should be populated as well by the GCP module when parsing GKE logs ?

I will open an issue on Github and create a pull request for that change in a few days, I'll wait for your inputs / comments on this.

Cheers
Antoine

shaunak · April 26, 2021, 6:31pm

Hi @aryon,

Thank you for bringing this up and also finding exactly where the enhancement needs to be made in the code — you rock!

I'd encourage you to create a PR for this enhancement as soon as you are ready.

Cheers,

Shaunak

aryon · April 27, 2021, 8:52am

Hi @shaunak,

Thanks for your reply. I dug around the code and the original event a bit more, I don't know if this forum is the right place for this but I have a few questions as it is the first time I am contributing to the code.
Here is an example of the original event, before processing by the module:

{
  "insertId":"94170ac4-6e82-4345-98ad-3c780222d19d",
  "labels":{
    "authorization.k8s.io/decision":"allow",
    "authorization.k8s.io/reason":""
  },
  "logName":"projects/redacted/logs/cloudaudit.googleapis.com%2Fdata_access",
  "operation":{
    "first":true,
    "id":"94170ac4-6e82-4345-98ad-3c780222d19d",
    "last":true,
    "producer":"k8s.io"
  },
  "protoPayload":{
    "@type":"type.googleapis.com/google.cloud.audit.AuditLog",
    "authenticationInfo":{
      "principalEmail":"redacted"
    },
    "authorizationInfo":[
      {
        "granted":true,
        "permission":"io.k8s.core.v1.nodes.list",
        "resource":"core/v1/nodes"
      }
    ],
    "methodName":"io.k8s.core.v1.nodes.list",
    "requestMetadata":{
      "callerIp":"redacted",
      "callerSuppliedUserAgent":"GoogleCloudConsole"
    },
    "resourceName":"core/v1/nodes",
    "serviceName":"k8s.io",
    "status":{}
  },
  "receiveTimestamp":"2021-04-23T14:47:31.94822935Z",
  "resource":{
    "labels":{
      "cluster_name":"redacted",
      "location":"redacted",
      "project_id":"redacted"
    },
    "type":"k8s_cluster"
  },
  "timestamp":"2021-04-23T14:47:07.535383Z"
}

I was thinking of renaming resource.labels.type to orchestrator.type but I saw in the Google documentation (MonitoredResource | Cloud Logging | Google Cloud) that resource.labels.type can have different values that should not go in that ECS field.
So I am thinking that a more important change is required, with a new function that would check the resource.labels.type value before changing it to orchestrator.type.
That new function would be added in the processor chain here: beats/pipeline.js at master · elastic/beats · GitHub
Does that make sense ?
Cheers,
Antoine

legoguy1000 · April 27, 2021, 12:11pm

@aryon Do you mean resource.type? So becase the orchestrator.* spec is focused on container orchestration and the GCP logs are much more than just k8s, you'd have to conditionalize the processors with if: ctx.json.resource.type == "k8s_cluster". Then based on the spec, I would convert json.protoPayload.resourceName to orchestrator.resource.type and hardcode the orchestrator.type field to kubernetes.

aryon · April 27, 2021, 1:08pm

Yes I meant resource.type. I edited my post after I saw an indentation error in the log sample but forgot to correct the rest of the post

you'd have to conditionalize the processors with if: ctx.json.resource.type == "k8s_cluster" .

Yes that's exactly what I was thinking about!

Then based on the spec, I would convert json.protoPayload.resourceName to orchestrator.resource.type and hardcode the orchestrator.type field to kubernetes .

I was thinking of using a dissect processor on json.protoPayload.resourceName to populate orchestrator.api_version, orchestrator.resource.type and orchestrator.resource.name as it seems these fields should contain "parsed" values based on the examples given here https://github.com/elastic/ecs/blob/master/rfcs/text/0012-orchestrator-field-set.md#kubernetes-audit-log,

legoguy1000 · April 27, 2021, 1:21pm

so I would say that

orchestrator.resource.type: nodes or core/v1/nodes
orchestrator.api_version: v1
orchestrator.type: kubernetes

The example provided for orchestrator.resource.name here Orchestrator Fields | Elastic Common Schema (ECS) Reference [1.10] | Elastic, shows that it should be the actual name of the pod (or in this case node) which for this example doesn't exists since its listing all the nodes.

aryon · April 28, 2021, 8:12am

Hi guys,
I created this draft PR : [FileBeat] GCP module enhancement - Populate orchestrator.* fields for K8S logs by TonioRyo · Pull Request #25368 · elastic/beats · GitHub
I think it requires more work as I was not able to find a way to dissect the json.protoPayload.resourceName field to populate orchestrator.resource.type or orchestrator.resource.name correctly.
Here are some example of json.protoPayload.resourceName field values, if you have any idea on how to dissect them:

coordination.k8s.io/v1/namespaces/kube-system/leases/kube-scheduler
coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager
core/v1/namespaces/kube-system/pods/gadget-ktp98
core/v1/namespaces/kube-system/pods/gadget-6dd74
core/v1/namespaces/kube-system/configmaps/cluster-kubestore
core/v1/namespaces/kube-system/configmaps/clustermetrics
core/v1/namespaces/kube-system/endpoints/managed-certificate-controller
core/v1/namespaces/kube-system/endpoints/gcp-controller-manager
core/v1/namespaces/kube-system/configmaps/gke-common-webhook-lock
core/v1/namespaces/kube-system/endpoints/vpa-recommender
core/v1/namespaces/kube-system/configmaps/ingress-gce-lock
coordination.k8s.io/v1/namespaces/kube-system/leases/snapshot-controller-leader
core/v1/namespaces/kube-system/configmaps/ingress-uid
batch/v1beta1/cronjobs
batch/v1/jobs
core/v1/componentstatuses
core/v1/nodes
core/v1/namespaces

Also I think there will be an issue with the orchestrator.api_version for the following json.protoPayload.resourceName field values, I'll have to check that more thoroughly:

apis/admissionregistration.k8s.io/v1
apis/admissionregistration.k8s.io/v1beta1
apis/apiextensions.k8s.io/v1
apis/apiextensions.k8s.io/v1beta1
apis/apiregistration.k8s.io/v1
apis/apiregistration.k8s.io/v1beta1
apis/apps/v1
apis/authentication.k8s.io/v1
apis/authentication.k8s.io/v1beta1
apis/authorization.k8s.io/v1
apis/authorization.k8s.io/v1beta1
apis/autoscaling/v1

Cheers
Antoine

legoguy1000 · April 28, 2021, 3:50pm

I made comments on the PR

aryon · April 28, 2021, 4:05pm

Yes I saw them Alex, thank you ! I'll try to find some time tomorrow to address them.

system · May 26, 2021, 6:05pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Extract kubernetes cluster name from logs to visualize in kibana dashboard Beats filebeat	4	1386	October 17, 2019
Multi-cluster Kubernetes monitoring - best practices? Cluster name field? Beats filebeat	3	749	September 11, 2018
Issue with Filebeat Custom cluster_id Field and Routing Logs to Specific Kibana Spaces via Ingest Pipeline Beats filebeat	0	11	August 9, 2024
ECK \| Filebeat \| Kubernetes Logs are missing / no field data Beats docker , filebeat	1	192	December 1, 2023
How do you set the orchestrator.cluster.name in Filebeat? Beats docker , filebeat	0	11	January 25, 2025

GCP module - Add cluster_name for GKE / k8s.io logs

Related topics