APM in Kibana not showing traces

, ,

ElasticStack version:
8.6.0

Kibana version:
8.6.0

Elasticsearch version:
8.6.0

APM Server version:
8.5.1

APM Agent language and version:
PHP: 1.6.1
JS (apm-rum): 6.12.0
JS (apm-rum-core): 6.17.0

Fresh install or upgraded from other version?
Fresh install (restored from a snapshot of another fresh install on the same version).

Is there anything special in your setup? Not that I'm aware of.

Description of the problem including expected versus actual behavior. Please include screenshots (if relevant):

We have been using the PHP agent successfully on this deployment for a few months now. This week, we added the RUM agent and our PHP transactions/traces are no longer available for all time periods. It looks like there's something in the RUM traces that are causing the PHP traces/transactions to not load.

The API transactions are shown when restricting the time interval to 3 seconds:

Those traces disappear after increasing the date range:

Filtering for the service name will bring those traces back, as well:

Removing the filter for service.name removes all of those transactions, and only app transactions are shown again.

I'm seeing the same issue on the "Services" page:

Specifying the service.name in the filter shows latency/throughput for the api service:

I am unable to load ANY transactions from the "view service" page.

In the screenshot I see 3 services. Are there services you are expecting to see that don't show up or what is the problem in this case?

What happens if you clear the search bar?

In the screenshot I see 3 services. Are there services you are expecting to see that don't show up or what is the problem in this case?

All three services (api, www, app) show up in the services page. api/www do not show any throughput, latency, or error rate unless I search for service.name:"api" or service.name:"www".

What happens if you clear the search bar?

It's the exact same when the search bar is cleared.

It looks like the RUM agent (app) is ingesting transaction metrics but the php agent (api and www) is not.

Did you disable transaction metrics (aka aggregated metrics) in APM Server? If you can provide me with the APM Server configuration and the APM Server logs I might be able to figure out why this is happening.

We aren't storing the agent logs at the moment, so I wont' be able to provide those. The policy configuration for the backend is as following:

revision: 15
outputs:
  default:
    type: elasticsearch
    hosts:
      - 'https://failover.es.us-east-1.aws.found.io:443'
output_permissions:
  default:
    _elastic_agent_monitoring:
      indices:
        - names:
            - metrics-elastic_agent.auditbeat-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.apm_server-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.endpoint_security-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.elastic_agent-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.cloudbeat-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.metricbeat-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.osquerybeat-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.packetbeat-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.heartbeat-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.filebeat-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-elastic_agent.fleet_server-production
          privileges:
            - auto_configure
            - create_doc
    _elastic_agent_checks:
      cluster:
        - monitor
    808d4e27-038b-469c-a345-26015fa77bb4:
      indices:
        - names:
            - logs-nginx.access-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-nginx.error-production
          privileges:
            - auto_configure
            - create_doc
    39fc54d6-c45b-44d3-a63c-7585925e5005:
      indices:
        - names:
            - logs-system.auth-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-system.syslog-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-system.cpu-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-system.diskio-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-system.filesystem-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-system.fsstat-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-system.load-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-system.memory-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-system.network-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-system.process-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-system.process.summary-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-system.socket_summary-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-system.uptime-production
          privileges:
            - auto_configure
            - create_doc
    419b5495-812c-4115-bf78-19222d79c2e5:
      indices:
        - names:
            - logs-generic-production
          privileges:
            - auto_configure
            - create_doc
    213815f0-a1f1-4caa-b9fa-e05c6622b52d:
      indices:
        - names:
            - metrics-docker.container-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-docker.cpu-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-docker.diskio-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-docker.event-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-docker.healthcheck-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-docker.info-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-docker.memory-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-docker.network-production
          privileges:
            - auto_configure
            - create_doc
    0eaa298c-ee6c-4705-b54a-59c2663a58b8:
      indices:
        - names:
            - logs-apm.app-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-apm.app.*-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-apm.error-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-apm.internal-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - traces-apm.rum-production
          privileges:
            - auto_configure
            - create_doc
        - names:
            - traces-apm.sampled-production
          privileges:
            - auto_configure
            - create_doc
            - maintenance
            - monitor
            - read
        - names:
            - traces-apm-production
          privileges:
            - auto_configure
            - create_doc
agent:
  download:
    sourceURI: 'https://artifacts.elastic.co/downloads/'
  monitoring:
    enabled: true
    use_output: default
    namespace: production
    logs: false
    metrics: true
inputs:
  - id: logfile-nginx-808d4e27-038b-469c-a345-26015fa77bb4
    name: nginx-3
    revision: 2
    type: logfile
    use_output: default
    meta:
      package:
        name: nginx
        version: 1.6.0
    data_stream:
      namespace: production
    package_policy_id: 808d4e27-038b-469c-a345-26015fa77bb4
    streams:
      - id: logfile-nginx.access-808d4e27-038b-469c-a345-26015fa77bb4
        data_stream:
          dataset: nginx.access
          type: logs
        exclude_files:
          - .gz$
        ignore_older: 72h
        paths:
          - /var/log/containers/nginx-proxy/access.log*
        processors:
          - add_locale: null
        tags:
          - nginx-access
      - id: logfile-nginx.error-808d4e27-038b-469c-a345-26015fa77bb4
        data_stream:
          dataset: nginx.error
          type: logs
        exclude_files:
          - .gz$
        ignore_older: 72h
        paths:
          - /var/log/containers/nginx-proxy/error.log*
        multiline:
          negate: true
          pattern: '^\d{4}\/\d{2}\/\d{2} '
          match: after
        processors:
          - drop_event:
              when:
                or:
                  - contains:
                      message: production.DEBUG
                  - contains:
                      message: production.INFO
          - add_locale: null
        tags:
          - nginx-error
  - id: logfile-system-39fc54d6-c45b-44d3-a63c-7585925e5005
    name: system-4
    revision: 2
    type: logfile
    use_output: default
    meta:
      package:
        name: system
        version: 1.20.4
    data_stream:
      namespace: production
    package_policy_id: 39fc54d6-c45b-44d3-a63c-7585925e5005
    streams:
      - id: logfile-system.auth-39fc54d6-c45b-44d3-a63c-7585925e5005
        data_stream:
          dataset: system.auth
          type: logs
        exclude_files:
          - .gz$
        ignore_older: 72h
        paths:
          - /var/log/auth.log*
          - /var/log/secure*
        multiline:
          pattern: ^\s
          match: after
        processors:
          - add_locale: null
        tags:
          - system-auth
      - id: logfile-system.syslog-39fc54d6-c45b-44d3-a63c-7585925e5005
        data_stream:
          dataset: system.syslog
          type: logs
        exclude_files:
          - .gz$
        ignore_older: 72h
        paths:
          - /var/log/messages*
          - /var/log/syslog*
        multiline:
          pattern: ^\s
          match: after
        processors:
          - add_locale: null
  - id: system/metrics-system-39fc54d6-c45b-44d3-a63c-7585925e5005
    name: system-4
    revision: 2
    type: system/metrics
    use_output: default
    meta:
      package:
        name: system
        version: 1.20.4
    data_stream:
      namespace: production
    package_policy_id: 39fc54d6-c45b-44d3-a63c-7585925e5005
    streams:
      - id: system/metrics-system.cpu-39fc54d6-c45b-44d3-a63c-7585925e5005
        data_stream:
          dataset: system.cpu
          type: metrics
        period: 10s
        cpu.metrics:
          - percentages
          - normalized_percentages
        metricsets:
          - cpu
      - id: system/metrics-system.diskio-39fc54d6-c45b-44d3-a63c-7585925e5005
        data_stream:
          dataset: system.diskio
          type: metrics
        period: 10s
        diskio.include_devices: null
        metricsets:
          - diskio
      - id: system/metrics-system.filesystem-39fc54d6-c45b-44d3-a63c-7585925e5005
        data_stream:
          dataset: system.filesystem
          type: metrics
        period: 1m
        metricsets:
          - filesystem
        processors:
          - drop_event.when.regexp:
              system.filesystem.mount_point: ^/(sys|cgroup|proc|dev|etc|host|lib|snap)($|/)
      - id: system/metrics-system.fsstat-39fc54d6-c45b-44d3-a63c-7585925e5005
        data_stream:
          dataset: system.fsstat
          type: metrics
        period: 1m
        metricsets:
          - fsstat
        processors:
          - drop_event.when.regexp:
              system.fsstat.mount_point: ^/(sys|cgroup|proc|dev|etc|host|lib|snap)($|/)
      - id: system/metrics-system.load-39fc54d6-c45b-44d3-a63c-7585925e5005
        data_stream:
          dataset: system.load
          type: metrics
        condition: '${host.platform} != ''windows'''
        period: 10s
        metricsets:
          - load
      - id: system/metrics-system.memory-39fc54d6-c45b-44d3-a63c-7585925e5005
        data_stream:
          dataset: system.memory
          type: metrics
        period: 10s
        metricsets:
          - memory
      - id: system/metrics-system.network-39fc54d6-c45b-44d3-a63c-7585925e5005
        data_stream:
          dataset: system.network
          type: metrics
        period: 10s
        network.interfaces: null
        metricsets:
          - network
      - id: system/metrics-system.process-39fc54d6-c45b-44d3-a63c-7585925e5005
        data_stream:
          dataset: system.process
          type: metrics
        process.include_top_n.by_memory: 5
        period: 10s
        processes:
          - .*
        process.include_top_n.by_cpu: 5
        process.cgroups.enabled: false
        process.cmdline.cache.enabled: true
        metricsets:
          - process
        process.include_cpu_ticks: false
      - id: >-
          system/metrics-system.process.summary-39fc54d6-c45b-44d3-a63c-7585925e5005
        data_stream:
          dataset: system.process.summary
          type: metrics
        period: 10s
        metricsets:
          - process_summary
      - id: >-
          system/metrics-system.socket_summary-39fc54d6-c45b-44d3-a63c-7585925e5005
        data_stream:
          dataset: system.socket_summary
          type: metrics
        period: 10s
        metricsets:
          - socket_summary
      - id: system/metrics-system.uptime-39fc54d6-c45b-44d3-a63c-7585925e5005
        data_stream:
          dataset: system.uptime
          type: metrics
        period: 10s
        metricsets:
          - uptime
  - id: logfile-logs-419b5495-812c-4115-bf78-19222d79c2e5
    name: log-3
    revision: 1
    type: logfile
    use_output: default
    meta:
      package:
        name: log
        version: 1.1.0
    data_stream:
      namespace: production
    package_policy_id: 419b5495-812c-4115-bf78-19222d79c2e5
    streams:
      - id: logfile-log.log-419b5495-812c-4115-bf78-19222d79c2e5
        data_stream:
          dataset: generic
        paths:
          - /var/log/containers/app/elk.log*
        processors:
          - add_fields:
              target: service
              fields:
                name: api
          - add_docker_metadata: null
        json.keys_under_root: true
        json.message_key: message
        json.add_error_key: true
  - id: docker/metrics-docker-213815f0-a1f1-4caa-b9fa-e05c6622b52d
    name: docker-3
    revision: 1
    type: docker/metrics
    use_output: default
    meta:
      package:
        name: docker
        version: 2.2.0
    data_stream:
      namespace: production
    package_policy_id: 213815f0-a1f1-4caa-b9fa-e05c6622b52d
    streams:
      - id: docker/metrics-docker.container-213815f0-a1f1-4caa-b9fa-e05c6622b52d
        data_stream:
          dataset: docker.container
          type: metrics
        metricsets:
          - container
        hosts:
          - 'unix:///var/run/docker.sock'
        period: 10s
        labels.dedot: true
      - id: docker/metrics-docker.cpu-213815f0-a1f1-4caa-b9fa-e05c6622b52d
        data_stream:
          dataset: docker.cpu
          type: metrics
        metricsets:
          - cpu
        hosts:
          - 'unix:///var/run/docker.sock'
        period: 10s
        labels.dedot: true
      - id: docker/metrics-docker.diskio-213815f0-a1f1-4caa-b9fa-e05c6622b52d
        data_stream:
          dataset: docker.diskio
          type: metrics
        metricsets:
          - diskio
        hosts:
          - 'unix:///var/run/docker.sock'
        period: 10s
        labels.dedot: true
        skip_major:
          - 9
          - 253
      - id: docker/metrics-docker.event-213815f0-a1f1-4caa-b9fa-e05c6622b52d
        data_stream:
          dataset: docker.event
          type: metrics
        metricsets:
          - event
        hosts:
          - 'unix:///var/run/docker.sock'
        period: 10s
        labels.dedot: true
      - id: docker/metrics-docker.healthcheck-213815f0-a1f1-4caa-b9fa-e05c6622b52d
        data_stream:
          dataset: docker.healthcheck
          type: metrics
        metricsets:
          - healthcheck
        hosts:
          - 'unix:///var/run/docker.sock'
        period: 10s
        labels.dedot: true
      - id: docker/metrics-docker.info-213815f0-a1f1-4caa-b9fa-e05c6622b52d
        data_stream:
          dataset: docker.info
          type: metrics
        metricsets:
          - info
        hosts:
          - 'unix:///var/run/docker.sock'
        period: 10s
      - id: docker/metrics-docker.memory-213815f0-a1f1-4caa-b9fa-e05c6622b52d
        data_stream:
          dataset: docker.memory
          type: metrics
        metricsets:
          - memory
        hosts:
          - 'unix:///var/run/docker.sock'
        period: 10s
        labels.dedot: true
      - id: docker/metrics-docker.network-213815f0-a1f1-4caa-b9fa-e05c6622b52d
        data_stream:
          dataset: docker.network
          type: metrics
        metricsets:
          - network
        hosts:
          - 'unix:///var/run/docker.sock'
        period: 10s
        labels.dedot: true
  - id: 0eaa298c-ee6c-4705-b54a-59c2663a58b8
    name: apm-3 (prod)
    revision: 2
    type: apm
    use_output: default
    meta:
      package:
        name: apm
        version: 8.6.0
    data_stream:
      namespace: production
    package_policy_id: 0eaa298c-ee6c-4705-b54a-59c2663a58b8
    apm-server:
      capture_personal_data: true
      max_connections: 0
      max_event_size: 307200
      auth:
        api_key:
          enabled: false
          limit: 100
        anonymous:
          enabled: true
          allow_agent: null
          allow_service: null
          rate_limit:
            ip_limit: 1000
            event_limit: 300
        secret_token: null
      default_service_environment: null
      shutdown_timeout: 30s
      sampling:
        tail:
          enabled: false
          storage_limit: 3GB
          policies:
            - sample_rate: 0.1
          interval: 1m
      aggregation:
        service:
          enabled: false
      rum:
        enabled: true
        exclude_from_grouping: ^/webpack
        allow_headers: null
        response_headers: null
        library_pattern: node_modules|bower_components|~
        allow_origins:
          - '*'
        source_mapping:
          metadata: []
      ssl:
        enabled: false
        key_passphrase: null
        certificate: null
        supported_protocols:
          - TLSv1.1
          - TLSv1.2
          - TLSv1.3
        curve_types: null
        key: null
        cipher_suites: null
      response_headers: null
      write_timeout: 30s
      pprof.enabled: false
      host: '0.0.0.0:8200'
      max_header_size: 1048576
      idle_timeout: 45s
      expvar.enabled: false
      read_timeout: 3600s
      java_attacher:
        enabled: false
        discovery-rules: null
        download-agent-version: null
      agent_config: []
fleet:
  hosts:
    - 'https://failover.fleet.us-east-1.aws.found.io:443'

The configuration for the frontend is

revision: 17
outputs:
  es-containerhost:
    type: elasticsearch
    hosts:
      - 'http://552643323c9e4c06ba488dc604b2312a.containerhost:9244'
output_permissions:
  es-containerhost:
    _elastic_agent_monitoring:
      indices: []
    _elastic_agent_checks:
      cluster:
        - monitor
    elastic-cloud-apm:
      indices:
        - names:
            - logs-apm.app-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-apm.app.*-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - logs-apm.error-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - metrics-apm.internal-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - traces-apm.rum-default
          privileges:
            - auto_configure
            - create_doc
        - names:
            - traces-apm.sampled-default
          privileges:
            - auto_configure
            - create_doc
            - maintenance
            - monitor
            - read
        - names:
            - traces-apm-default
          privileges:
            - auto_configure
            - create_doc
agent:
  download:
    sourceURI: 'https://artifacts.elastic.co/downloads/'
  monitoring:
    enabled: false
    logs: false
    metrics: false
inputs:
  - id: fleet-server-fleet_server-elastic-cloud-fleet-server
    name: Fleet Server
    revision: 1
    type: fleet-server
    use_output: es-containerhost
    meta:
      package:
        name: fleet_server
        version: 1.2.0
    data_stream:
      namespace: default
    package_policy_id: elastic-cloud-fleet-server
    server:
      port: 8220
      host: 0.0.0.0
    server.runtime:
      gc_percent: 20
  - id: elastic-cloud-apm
    name: Elastic APM
    revision: 11
    type: apm
    use_output: es-containerhost
    meta:
      package:
        name: apm
        version: 8.6.0
    data_stream:
      namespace: default
    package_policy_id: elastic-cloud-apm
    apm-server:
      capture_personal_data: true
      max_connections: 0
      max_event_size: 307200
      auth:
        api_key:
          enabled: true
          limit: 100
        anonymous:
          enabled: true
          allow_agent:
            - rum-js
            - js-base
            - iOS/swift
          allow_service: null
          rate_limit:
            ip_limit: 1000
            event_limit: 300
        secret_token: Srh7Hl1oiTQO3izLVJ
      default_service_environment: null
      shutdown_timeout: 30s
      sampling:
        tail:
          enabled: false
          storage_limit: 3GB
          policies:
            - sample_rate: 0.1
          interval: 1m
      aggregation:
        service:
          enabled: false
      rum:
        enabled: true
        exclude_from_grouping: ^/webpack
        allow_headers:
          - Content-Type
        library_pattern: node_modules|bower_components|~
        allow_origins:
          - '*'
          - app.800.com
          - app.localhost.800.com
          - 800.com
          - 'app.localhost.800.com:3000'
          - 800andre-app.ngrok.io
        source_mapping:
          metadata: []
      ssl:
        enabled: true
        key_passphrase: null
        certificate: /app/config/certs/node.crt
        supported_protocols:
          - TLSv1.1
          - TLSv1.2
          - TLSv1.3
        curve_types: null
        key: /app/config/certs/node.key
        cipher_suites: null
      response_headers: null
      write_timeout: 30s
      pprof.enabled: false
      host: '0.0.0.0:8200'
      max_header_size: 1048576
      idle_timeout: 45s
      expvar.enabled: false
      read_timeout: 3600s
      java_attacher:
        enabled: false
        discovery-rules: null
        download-agent-version: null
      agent_config: []
fleet:
  hosts:
    - 'https://failover.fleet.us-east-1.aws.found.io:443'

Thanks @Nacoma . I'll try to ask a colleague with more knowledge of the policy config for help and will get back to you.

@sqren Thank you. It might be worthwhile for me to note, that when hovering the highlighted "Based on sample transactions" that the tooltip message is as follows:

This page is using transaction event data as no metric events were found in the current time range, or a filter has been applied based on fields that are not available in metric event documents.

This message is only shown when the "missing" transactions are shown. If I recall correctly, this message was always shown before adding the app service's integration.

Given the context provided by the notice, and the differences with what I'm seeing locally, it may be significant that are no events that match processor.event:"metric" and metricset.name:"transaction" for any of the missing services. I'm still trying to determine why, but this feels important.

The agent is running the same software version, but there are zero transaction metric events in our cluster. I'm not sure what could cause this, or if those events are aggregated based on span_breakdowns during ingestion, or elsewhere.

Yes, this was the problem I mentioned earlier.

Can you please check that your RUM service has enabled tracestate propogation (see the docs). You can check this by looking at the headers on the outgoing requests from browser. You should see something like:

Access-Control-Request-Headers: traceparent, tracestate

@sqren Our API server is configured to allow all headers, and distributed tracing works as expected.

Here we can see that the app service has correlated the api transaction, and that we see spans for database queries executed on the api. However, those api transactions are still not visible in the previously mentioned pages.

Traces that do not originate from the app service are not visible either.

Both headers are present in the requests made from the browser:

image

After a lot of digging, I narrowed it down to the fleet managed agent. Upgrading it from 8.5.1 -> 8.6.0 resolved the issue. Thank you for your help.

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.