APM Log Correlation problem

Kibana version: 8.14.0

Elasticsearch version: 8.14.0

APM Server version: 8.13.4

APM Agent language and version: Node.js Agent 4.7.2

I have having trouble with log correlation, log for the entry exists but it is not visible:

Not working:

working by removing or condition from search query:

Please guide me on how to resolve this

Hi @Garvit_Jain,

I assume that you are using an ECS Logger alongside the agent? Can you share which one and the version you're using as well?

Hi @carly.richmond,

Yes we are using @elastic/ecs-pino-format version 1.5.0

I don't have an answer here. Talking it out a bit:

Your last screenshot has a log record "json" value of (likely has some typos from OCR):

{"@timestamp":"2024-09-09T04:25:29.336Z","dd":{"env":"staging-cluster", "service": "core-utilities", "span_id":"809727883895043126", "trace_id":"809727883895043126", "version": "60"}, "ecs.version": "8.10.0", "event.dataset":"core-utilities", "host.hostname": "core-utilities-service-6f97b9f874-wxbxq", "log.level": "info", "message": "Health check 1og correlation example", "process.pid":1, "service.name":"core-utilities", "service.node.name": "core-utilities-service-6f97b9f874-wxbxq"}

Pretty-printed:

{
  "@timestamp": "2024-09-09T04:25:29.336Z",
  "dd": {
    "env": "staging-cluster",
    "service": "core-utilities",
    "span_id": "809727883895043126",
    "trace_id": "809727883895043126",
    "version": "60"
  },
  "ecs.version": "8.10.0",
  "event.dataset": "core-utilities",
  "host.hostname": "core-utilities-service-6f97b9f874-wxbxq",
  "log.level": "info",
  "message": "Health check log correlation example",
  "process.pid": 1,
  "service.name": "core-utilities",
  "service.node.name": "core-utilities-service-6f97b9f874-wxbxq"
}

with a trace.id of 4b6d8e12bdcedd553fde9d848277d5ab.
(I'm guessing the "dd" object in the log record is DataDog, but it doesn't matter.)

A log record formated by @elastic/ecs-pino-format will look something like this:

{"log.level":"info","@timestamp":"2024-09-09T19:22:32.959Z","process.pid":14360,"host.hostname":"peach.local","ecs.version":"8.10.0","service.name":"http-with-elastic-apm","service.version":"1.5.0","service.environment":"development","event.dataset":"http-with-elastic-apm","trace.id":"b1b990ed164a2cde04df413e1a9eed0c","transaction.id":"97bd2b0ff24c0719",...,"message":"handled request"}

Q1: What mechanism are you using to get log records from your app/service into Elastic? filebeat perhaps?

Q2: Can you find that same record in the "Discover" app in Kibana (https:/.../app/discover), and then show the JSON of the full document in Elasticsearch? (I'm showing my ignorance here with the "Stream" view and what that "json" column is really showing. Presumably there is a top-level trace.id field in the actual full document.

That information might help sort out what is going wrong.
I am inclined to think that your "Not working:" screenshot showing no hits for your query with the "OR ..." condition is a bug, but I'm not sure.

Hi @trentm,

The log that is written to log file from the kubernetes pod, running node.js service is:

{
  "log.level": "info",
  "@timestamp": "2024-09-10T16:40:24.030Z",
  "process.pid": 1,
  "host.hostname": "core-utilities-service-f57df9956-l7kw2",
  "ecs.version": "8.10.0",
  "service.name": "core-utilities",
  "service.version": "61",
  "service.environment": "staging",
  "service.node.name": "core-utilities-service-f57df9956-l7kw2",
  "event.dataset": "core-utilities",
  "dd": {
    "trace_id": "5963508668288421859",
    "span_id": "5963508668288421859",
    "service": "core-utilities",
    "version": "61",
    "env": "staging-cluster"
  },
  "trace.id": "2622a33f940097313c14911871f82174",
  "transaction.id": "1887bdfad0df6f9d",
  "message": "Health check log correlation example"
}

We have elastic agent running as daemon set on the cluster, with the following config for these particular logs.

inputs:
  - id: container-log-${kubernetes.pod.name}-${kubernetes.container.id}
    type: filestream
    use_output: kafka-logs-production-output
    meta:
      package:
        name: kubernetes
        version: 1.52.0
    data_stream:
      namespace: production
    streams:
      # Stream ID for this data stream allowing Filebeat to track the state of the ingested files. Must be unique.
      # Each filestream data stream creates a separate instance of the Filebeat filestream input.
      - id: container-log-${kubernetes.pod.name}-${kubernetes.container.id}
        data_stream:
          dataset: kubernetes.container_logs
          type: logs
        prospector.scanner.symlinks: true
        parsers:
          - container: ~
          - ndjson:
              target: "json"
              ignore_decoding_error: true
        paths:
          - /var/log/containers/*microservices*${kubernetes.container.id}.log

Here dataset: kubernetes.container_logs makes the logs go into single datastream only, because we run around 150 microservices and that was causing too many shards problem and secondly, there was conflict with mappings between services, so we decided to let log entry stay in json key and indexed it as Object.

The data from elasticagent is sent to kakfa topic. Then logstash reads it from kafka and pushes it to elasticsearch with the following pipeline config

input { kafka { ... } } 
         filter {
          if [json] and [json][ecs.version] {
            mutate {
              rename => {
                "[json][service.version]" => "service.version"
                "[json][service.environment]" => "service.environment"
                "[json][trace.id]" => "trace.id"
                "[json][transaction.id]" => "transaction.id"
              }
            }
          }
          if [json][message] and [json][message] =~ /^[\s\S]*$/ {
            mutate {
              add_field => { "message" => "%{[json][message]}" }
            }
          }
        }
         output { elasticsearch {...}}

Discover found document:

{
  "_index": ".ds-logs-kubernetes.container_logs-staging-2024.09.10-004751",
  "_id": "ytDO3JEBG768Sj1t7u7G",
  "_version": 1,
  "_score": 0,
  "_source": {
    "container": {
      "image": {
        "name": "RENAMED"
      },
      "runtime": "containerd",
      "id": "c8e449fb43bccd51dca2f0cad83072f8d2af9433e497ea216ed09eedb507137c"
    },
    "kubernetes": {
      "container": {
        "name": "core-utilities"
      },
      "node": {
        "uid": "RENAMED",
        "hostname": "RENAMED",
        "name": "RENAMED",
        "labels": {
          "kubernetes_io/hostname": "RENAMED",
          "topology_kubernetes_io/region": "ap-south-1",
          "provisioner_cast_ai/node-id": "RENAMED",
          "karpenter_k8s_aws/instance-family": "m7g",
          "kubernetes_io/arch": "arm64",
          "topology_ebs_csi_aws_com/zone": "ap-south-1a",
          "karpenter_k8s_aws/instance-network-bandwidth": "15000",
          "karpenter_sh/registered": "true",
          "beta_kubernetes_io/instance-type": "m7g.8xlarge",
          "failure-domain_beta_kubernetes_io/zone": "ap-south-1a",
          "karpenter_k8s_aws/instance-cpu-manufacturer": "aws",
          "beta_kubernetes_io/os": "linux",
          "karpenter_k8s_aws/instance-category": "m",
          "beta_kubernetes_io/arch": "arm64",
          "karpenter_k8s_aws/instance-ebs-bandwidth": "10000",
          "karpenter_k8s_aws/instance-generation": "7",
          "karpenter_k8s_aws/instance-hypervisor": "nitro",
          "topology_kubernetes_io/zone": "ap-south-1a",
          "karpenter_sh/nodepool": "microservice-arm-nodes",
          "failure-domain_beta_kubernetes_io/region": "ap-south-1",
          "karpenter_k8s_aws/instance-encryption-in-transit-supported": "true",
          "topology_k8s_aws/zone-id": "aps1-az1",
          "karpenter_k8s_aws/instance-cpu": "32",
          "karpenter_sh/capacity-type": "spot",
          "k8s_io/cloud-provider-aws": "RENAMED",
          "karpenter_k8s_aws/instance-size": "8xlarge",
          "karpenter_sh/initialized": "true",
          "node_kubernetes_io/instance-type": "m7g.8xlarge",
          "kubernetes_io/os": "linux",
          "karpenter_k8s_aws/instance-memory": "131072"
        }
      },
      "pod": {
        "uid": "87be4e23-42b0-4711-9e3b-4825ac4f993d",
        "ip": "RENAMED",
        "name": "core-utilities-service-f57df9956-l7kw2"
      },
      "namespace": "microservices",
      "replicaset": {
        "name": "core-utilities-service-f57df9956"
      },
      "namespace_uid": "RENAMED",
      "namespace_labels": {
        "goldilocks_fairwinds_com/enabled": "true",
        "kubernetes_io/metadata_name": "microservices"
      },
      "labels": {
        "app": "core-utilities",
        "tags_datadoghq_com/version": "61",
        "ELASTIC_APM_SERVICE_VERSION": "1",
        "admission_datadoghq_com/enabled": "true",
        "service": "core-utilities",
        "pod-template-hash": "RENAMED",
        "tags_datadoghq_com/service": "core-utilities",
        "app_kubernetes_io/name": "core-utilities",
        "app_kubernetes_io/instance": "core-utilities",
        "tags_datadoghq_com/env": "staging-cluster"
      }
    },
    "service.environment": "staging",
    "agent": {
      "name": "RENAMED",
      "id": "RENAMED",
      "ephemeral_id": "RENAMED",
      "type": "filebeat",
      "version": "8.13.4"
    },
    "log": {
      "file": {
        "inode": "139910199",
        "path": "/var/log/containers/core-utilities-service-f57df9956-l7kw2_microservices_core-utilities-c8e449fb43bccd51dca2f0cad83072f8d2af9433e497ea216ed09eedb507137c.log",
        "device_id": "RENAMED"
      },
      "offset": RENAMED
    },
    "elastic_agent": {
      "id": "e068c6c9-7f12-4d98-83a0-63142d360820",
      "version": "8.13.4",
      "snapshot": false
    },
    "message": "Health check log correlation example",
    "transaction.id": "1887bdfad0df6f9d",
    "cloud": {
      "availability_zone": "ap-south-1a",
      "image": {
        "id": "RENAMED"
      },
      "instance": {
        "id": "RENAMED"
      },
      "provider": "aws",
      "machine": {
        "type": "m7g.8xlarge"
      },
      "service": {
        "name": "EC2"
      },
      "region": "ap-south-1",
      "account": {
        "id": "RENAMED"
      }
    },
    "input": {
      "type": "filestream"
    },
    "trace.id": "2622a33f940097313c14911871f82174",
    "@timestamp": "2024-09-10T16:40:24.031Z",
    "ecs": {
      "version": "8.0.0"
    },
    "service.version": "61",
    "stream": "stdout",
    "data_stream": {
      "namespace": "staging",
      "type": "logs",
      "dataset": "kubernetes.container_logs"
    },
    "service": {
      "name": "core-utilities"
    },
    "@version": "1",
    "host": {
      "hostname": "RENAMED",
      "os": {
        "kernel": "RENAMED",
        "codename": "focal",
        "name": "Ubuntu",
        "type": "linux",
        "family": "debian",
        "version": "20.04.6 LTS (Focal Fossa)",
        "platform": "ubuntu"
      },
      "containerized": false,
      "name": "RENAMED",
      "id": "b6df0473492a47d6ac9a4d04dcb5c9aa",
      "architecture": "aarch64"
    },
    "json": {
      "dd": {
        "trace_id": "5963508668288421859",
        "span_id": "5963508668288421859",
        "service": "core-utilities",
        "env": "staging-cluster",
        "version": "61"
      },
      "@timestamp": "2024-09-10T16:40:24.030Z",
      "service.name": "core-utilities",
      "ecs.version": "8.10.0",
      "log.level": "info",
      "service.node.name": "core-utilities-service-f57df9956-l7kw2",
      "host.hostname": "core-utilities-service-f57df9956-l7kw2",
      "message": "Health check log correlation example",
      "process.pid": 1,
      "event.dataset": "core-utilities"
    },
    "event": {
      "agent_id_status": "auth_metadata_missing",
      "ingested": "2024-09-10T16:40:34Z",
      "dataset": "kubernetes.container_logs"
    }
  },
  "fields": {
    "kubernetes.node.uid": [
      "RENAMED"
    ],
    "elastic_agent.version": [
      "8.13.4"
    ],
    "kubernetes.namespace_uid": [
      "RENAMED"
    ],
    "host.os.name.text": [
      "Ubuntu"
    ],
    "kubernetes.node.labels.topology_kubernetes_io/zone": [
      "ap-south-1a"
    ],
    "kubernetes.labels.tags_datadoghq_com/env": [
      "staging-cluster"
    ],
    "kubernetes.namespace_labels.goldilocks_fairwinds_com/enabled": [
      "true"
    ],
    "host.hostname": [
      "RENAMED"
    ],
    "kubernetes.node.labels.kubernetes_io/os": [
      "linux"
    ],
    "container.id": [
      "c8e449fb43bccd51dca2f0cad83072f8d2af9433e497ea216ed09eedb507137c"
    ],
    "kubernetes.labels.service": [
      "core-utilities"
    ],
    "transaction.id": [
      "1887bdfad0df6f9d"
    ],
    "kubernetes.node.labels.karpenter_k8s_aws/instance-network-bandwidth": [
      "15000"
    ],
    "container.image.name": [
      "RENAMED"
    ],
    "host.os.version": [
      "20.04.6 LTS (Focal Fossa)"
    ],
    "kubernetes.labels.app": [
      "core-utilities"
    ],
    "cloud.service.name.text": [
      "EC2"
    ],
    "kubernetes.node.labels.beta_kubernetes_io/os": [
      "linux"
    ],
    "kubernetes.node.labels.karpenter_sh/nodepool": [
      "microservice-arm-nodes"
    ],
    "agent.name": [
      "RENAMED"
    ],
    "event.agent_id_status": [
      "auth_metadata_missing"
    ],
    "kubernetes.labels.app_kubernetes_io/name": [
      "core-utilities"
    ],
    "kubernetes.node.labels.topology_kubernetes_io/region": [
      "ap-south-1"
    ],
    "kubernetes.node.labels.karpenter_k8s_aws/instance-encryption-in-transit-supported": [
      "true"
    ],
    "cloud.region": [
      "ap-south-1"
    ],
    "host.os.type": [
      "linux"
    ],
    "kubernetes.node.labels.failure-domain_beta_kubernetes_io/zone": [
      "ap-south-1a"
    ],
    "input.type": [
      "filestream"
    ],
    "host.architecture": [
      "aarch64"
    ],
    "cloud.machine.type": [
      "m7g.8xlarge"
    ],
    "cloud.provider": [
      "aws"
    ],
    "agent.id": [
      "e068c6c9-7f12-4d98-83a0-63142d360820"
    ],
    "host.containerized": [
      false
    ],
    "kubernetes.node.labels.beta_kubernetes_io/instance-type": [
      "m7g.8xlarge"
    ],
    "kubernetes.node.labels.failure-domain_beta_kubernetes_io/region": [
      "ap-south-1"
    ],
    "kubernetes.node.hostname": [
      "RENAMED"
    ],
    "kubernetes.node.labels.topology_k8s_aws/zone-id": [
      "aps1-az1"
    ],
    "cloud.instance.id": [
      "RENAMED"
    ],
    "agent.type": [
      "filebeat"
    ],
    "stream": [
      "stdout"
    ],
    "elastic_agent.snapshot": [
      false
    ],
    "host.id": [
      "RENAMED"
    ],
    "kubernetes.pod.ip": [
      "RENAMED"
    ],
    "kubernetes.node.labels.karpenter_sh/capacity-type": [
      "spot"
    ],
    "service.environment": [
      "staging"
    ],
    "kubernetes.container.name": [
      "core-utilities"
    ],
    "elastic_agent.id": [
      "e068c6c9-7f12-4d98-83a0-63142d360820"
    ],
    "kubernetes.replicaset.name": [
      "core-utilities-service-f57df9956"
    ],
    "kubernetes.node.labels.karpenter_k8s_aws/instance-family": [
      "m7g"
    ],
    "host.os.codename": [
      "focal"
    ],
    "cloud.image.id": [
      "RENAMED"
    ],
    "kubernetes.node.labels.beta_kubernetes_io/arch": [
      "arm64"
    ],
    "event.ingested": [
      "2024-09-10T16:40:34.000Z"
    ],
    "kubernetes.node.labels.karpenter_k8s_aws/instance-generation": [
      "7"
    ],
    "@timestamp": [
      "2024-09-10T16:40:24.031Z"
    ],
    "service.version": [
      "61"
    ],
    "cloud.account.id": [
      "RENAMED"
    ],
    "host.os.platform": [
      "ubuntu"
    ],
    "log.file.inode": [
      "139910199"
    ],
    "data_stream.dataset": [
      "kubernetes.container_logs"
    ],
    "log.file.path": [
      "/var/log/containers/core-utilities-service-f57df9956-l7kw2_microservices_core-utilities-c8e449fb43bccd51dca2f0cad83072f8d2af9433e497ea216ed09eedb507137c.log"
    ],
    "agent.ephemeral_id": [
      "RENAMED"
    ],
    "kubernetes.node.labels.kubernetes_io/arch": [
      "arm64"
    ],
    "kubernetes.labels.ELASTIC_APM_SERVICE_VERSION": [
      "1"
    ],
    "kubernetes.labels.admission_datadoghq_com/enabled": [
      "true"
    ],
    "cloud.availability_zone": [
      "ap-south-1a"
    ],
    "kubernetes.labels.pod-template-hash": [
      "RENAMED"
    ],
    "kubernetes.node.labels.provisioner_cast_ai/node-id": [
      "RENAMED"
    ],
    "kubernetes.namespace": [
      "microservices"
    ],
    "host.os.name": [
      "Ubuntu"
    ],
    "host.name": [
      "RENAMED"
    ],
    "kubernetes.labels.tags_datadoghq_com/service": [
      "core-utilities"
    ],
    "kubernetes.labels.app_kubernetes_io/instance": [
      "core-utilities"
    ],
    "log.offset": [
      13818215
    ],
    "data_stream.type": [
      "logs"
    ],
    "kubernetes.node.labels.karpenter_k8s_aws/instance-cpu-manufacturer": [
      "aws"
    ],
    "container.runtime": [
      "containerd"
    ],
    "cloud.service.name": [
      "EC2"
    ],
    "ecs.version": [
      "8.0.0"
    ],
    "agent.version": [
      "8.13.4"
    ],
    "host.os.family": [
      "debian"
    ],
    "kubernetes.node.labels.karpenter_k8s_aws/instance-memory": [
      "131072"
    ],
    "kubernetes.node.name": [
      "RENAMED"
    ],
    "kubernetes.node.labels.topology_ebs_csi_aws_com/zone": [
      "ap-south-1a"
    ],
    "kubernetes.node.labels.karpenter_k8s_aws/instance-ebs-bandwidth": [
      "10000"
    ],
    "kubernetes.node.labels.karpenter_k8s_aws/instance-cpu": [
      "32"
    ],
    "kubernetes.node.labels.node_kubernetes_io/instance-type": [
      "m7g.8xlarge"
    ],
    "kubernetes.pod.uid": [
      "87be4e23-42b0-4711-9e3b-4825ac4f993d"
    ],
    "kubernetes.node.labels.karpenter_k8s_aws/instance-category": [
      "m"
    ],
    "kubernetes.node.labels.karpenter_sh/initialized": [
      "true"
    ],
    "trace.id": [
      "2622a33f940097313c14911871f82174"
    ],
    "host.os.kernel": [
      "RENAMED"
    ],
    "log.file.device_id": [
      "66320"
    ],
    "@version": [
      "1"
    ],
    "json": [
      {
        "dd": {
          "trace_id": "5963508668288421859",
          "span_id": "5963508668288421859",
          "service": "core-utilities",
          "env": "staging-cluster",
          "version": "61"
        },
        "@timestamp": "2024-09-10T16:40:24.030Z",
        "service.name": "core-utilities",
        "ecs.version": "8.10.0",
        "log.level": "info",
        "service.node.name": "core-utilities-service-f57df9956-l7kw2",
        "host.hostname": "core-utilities-service-f57df9956-l7kw2",
        "message": "Health check log correlation example",
        "process.pid": 1,
        "event.dataset": "core-utilities"
      }
    ],
    "kubernetes.pod.name": [
      "core-utilities-service-f57df9956-l7kw2"
    ],
    "kubernetes.node.labels.k8s_io/cloud-provider-aws": [
      "RENAMED"
    ],
    "service.name": [
      "core-utilities"
    ],
    "data_stream.namespace": [
      "staging"
    ],
    "json.message": [
      "Health check log correlation example"
    ],
    "kubernetes.namespace_labels.kubernetes_io/metadata_name": [
      "microservices"
    ],
    "kubernetes.node.labels.karpenter_k8s_aws/instance-hypervisor": [
      "nitro"
    ],
    "message": [
      "Health check log correlation example"
    ],
    "kubernetes.node.labels.karpenter_k8s_aws/instance-size": [
      "8xlarge"
    ],
    "kubernetes.node.labels.kubernetes_io/hostname": [
      "RENAMED"
    ],
    "kubernetes.labels.tags_datadoghq_com/version": [
      "61"
    ],
    "kubernetes.node.labels.karpenter_sh/registered": [
      "true"
    ],
    "event.dataset": [
      "kubernetes.container_logs"
    ]
  }
}

Furthermore,

We are running 2 clusters and cluster in ap-south-1 is the one which has remote cluster configured for cross cluster search from ap-south-2.

I shared the screenshot for correlation not working is from the cluster in ap-south-1.

I tried the same on ap-south-2 cluster which do not have cross cluster search configured, it is behaving weird.

When this is loaded, it loads as no log messages to display.

However if I click it 2-3 times on check for new data, it loads

P.s. I tried on historic trace as well, and ensured log entry is indexed, still same behaviour.

It is working on ap-south-2 cluster.

Searched the same trace from ap-south-1 cluster with cross cluster search from ap-south-2

I believe the OR bug is when we have cross-cluster-search enabled.

Log Setting:

Dataview:

Apm indices settings:

logs-apm*,apm-*,dr:logs-apm*,dr:apm-*,*container_logs*,dr:logs-kubernetes.container_logs-production-dr

Hi @trentm,

I have identified the bug

If I use indices in log settings, correlation happens and that OR query also works.

logs-kubernetes.container_logs-*,dr:logs-kubernetes.container_logs-*

However, if I create a dataview using the same indices, and use it as Log sources the OR query breaks and the trace log correlation also stops working.