Nacoma
March 9, 2023, 11:39pm
1
ElasticStack version :
8.6.0
Kibana version :
8.6.0
Elasticsearch version :
8.6.0
APM Server version :
8.5.1
APM Agent language and version :
PHP: 1.6.1
JS (apm-rum): 6.12.0
JS (apm-rum-core): 6.17.0
Fresh install or upgraded from other version?
Fresh install (restored from a snapshot of another fresh install on the same version).
Is there anything special in your setup? Not that I'm aware of.
Description of the problem including expected versus actual behavior. Please include screenshots (if relevant) :
We have been using the PHP agent successfully on this deployment for a few months now. This week, we added the RUM agent and our PHP transactions/traces are no longer available for all time periods. It looks like there's something in the RUM traces that are causing the PHP traces/transactions to not load.
The API transactions are shown when restricting the time interval to 3 seconds:
Those traces disappear after increasing the date range:
Filtering for the service name will bring those traces back, as well:
Removing the filter for service.name
removes all of those transactions, and only app
transactions are shown again.
I'm seeing the same issue on the "Services" page:
Specifying the service.name
in the filter shows latency/throughput for the api
service:
I am unable to load ANY transactions from the "view service" page.
sqren
(Søren Louv Jansen)
March 10, 2023, 11:45am
2
In the screenshot I see 3 services. Are there services you are expecting to see that don't show up or what is the problem in this case?
sqren
(Søren Louv Jansen)
March 10, 2023, 11:47am
3
What happens if you clear the search bar?
Nacoma
March 10, 2023, 3:11pm
4
In the screenshot I see 3 services. Are there services you are expecting to see that don't show up or what is the problem in this case?
All three services (api, www, app) show up in the services page. api/www do not show any throughput, latency, or error rate unless I search for service.name:"api"
or service.name:"www"
.
What happens if you clear the search bar?
It's the exact same when the search bar is cleared.
sqren
(Søren Louv Jansen)
March 10, 2023, 9:04pm
5
It looks like the RUM agent (app
) is ingesting transaction metrics but the php agent (api
and www
) is not.
Did you disable transaction metrics (aka aggregated metrics) in APM Server? If you can provide me with the APM Server configuration and the APM Server logs I might be able to figure out why this is happening.
Nacoma
March 10, 2023, 9:25pm
6
We aren't storing the agent logs at the moment, so I wont' be able to provide those. The policy configuration for the backend is as following:
revision: 15
outputs:
default:
type: elasticsearch
hosts:
- 'https://failover.es.us-east-1.aws.found.io:443'
output_permissions:
default:
_elastic_agent_monitoring:
indices:
- names:
- metrics-elastic_agent.auditbeat-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-elastic_agent.apm_server-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-elastic_agent.endpoint_security-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-elastic_agent.elastic_agent-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-elastic_agent.cloudbeat-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-elastic_agent.metricbeat-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-elastic_agent.osquerybeat-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-elastic_agent.packetbeat-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-elastic_agent.heartbeat-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-elastic_agent.filebeat-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-elastic_agent.fleet_server-production
privileges:
- auto_configure
- create_doc
_elastic_agent_checks:
cluster:
- monitor
808d4e27-038b-469c-a345-26015fa77bb4:
indices:
- names:
- logs-nginx.access-production
privileges:
- auto_configure
- create_doc
- names:
- logs-nginx.error-production
privileges:
- auto_configure
- create_doc
39fc54d6-c45b-44d3-a63c-7585925e5005:
indices:
- names:
- logs-system.auth-production
privileges:
- auto_configure
- create_doc
- names:
- logs-system.syslog-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-system.cpu-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-system.diskio-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-system.filesystem-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-system.fsstat-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-system.load-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-system.memory-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-system.network-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-system.process-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-system.process.summary-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-system.socket_summary-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-system.uptime-production
privileges:
- auto_configure
- create_doc
419b5495-812c-4115-bf78-19222d79c2e5:
indices:
- names:
- logs-generic-production
privileges:
- auto_configure
- create_doc
213815f0-a1f1-4caa-b9fa-e05c6622b52d:
indices:
- names:
- metrics-docker.container-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-docker.cpu-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-docker.diskio-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-docker.event-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-docker.healthcheck-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-docker.info-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-docker.memory-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-docker.network-production
privileges:
- auto_configure
- create_doc
0eaa298c-ee6c-4705-b54a-59c2663a58b8:
indices:
- names:
- logs-apm.app-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-apm.app.*-production
privileges:
- auto_configure
- create_doc
- names:
- logs-apm.error-production
privileges:
- auto_configure
- create_doc
- names:
- metrics-apm.internal-production
privileges:
- auto_configure
- create_doc
- names:
- traces-apm.rum-production
privileges:
- auto_configure
- create_doc
- names:
- traces-apm.sampled-production
privileges:
- auto_configure
- create_doc
- maintenance
- monitor
- read
- names:
- traces-apm-production
privileges:
- auto_configure
- create_doc
agent:
download:
sourceURI: 'https://artifacts.elastic.co/downloads/'
monitoring:
enabled: true
use_output: default
namespace: production
logs: false
metrics: true
inputs:
- id: logfile-nginx-808d4e27-038b-469c-a345-26015fa77bb4
name: nginx-3
revision: 2
type: logfile
use_output: default
meta:
package:
name: nginx
version: 1.6.0
data_stream:
namespace: production
package_policy_id: 808d4e27-038b-469c-a345-26015fa77bb4
streams:
- id: logfile-nginx.access-808d4e27-038b-469c-a345-26015fa77bb4
data_stream:
dataset: nginx.access
type: logs
exclude_files:
- .gz$
ignore_older: 72h
paths:
- /var/log/containers/nginx-proxy/access.log*
processors:
- add_locale: null
tags:
- nginx-access
- id: logfile-nginx.error-808d4e27-038b-469c-a345-26015fa77bb4
data_stream:
dataset: nginx.error
type: logs
exclude_files:
- .gz$
ignore_older: 72h
paths:
- /var/log/containers/nginx-proxy/error.log*
multiline:
negate: true
pattern: '^\d{4}\/\d{2}\/\d{2} '
match: after
processors:
- drop_event:
when:
or:
- contains:
message: production.DEBUG
- contains:
message: production.INFO
- add_locale: null
tags:
- nginx-error
- id: logfile-system-39fc54d6-c45b-44d3-a63c-7585925e5005
name: system-4
revision: 2
type: logfile
use_output: default
meta:
package:
name: system
version: 1.20.4
data_stream:
namespace: production
package_policy_id: 39fc54d6-c45b-44d3-a63c-7585925e5005
streams:
- id: logfile-system.auth-39fc54d6-c45b-44d3-a63c-7585925e5005
data_stream:
dataset: system.auth
type: logs
exclude_files:
- .gz$
ignore_older: 72h
paths:
- /var/log/auth.log*
- /var/log/secure*
multiline:
pattern: ^\s
match: after
processors:
- add_locale: null
tags:
- system-auth
- id: logfile-system.syslog-39fc54d6-c45b-44d3-a63c-7585925e5005
data_stream:
dataset: system.syslog
type: logs
exclude_files:
- .gz$
ignore_older: 72h
paths:
- /var/log/messages*
- /var/log/syslog*
multiline:
pattern: ^\s
match: after
processors:
- add_locale: null
- id: system/metrics-system-39fc54d6-c45b-44d3-a63c-7585925e5005
name: system-4
revision: 2
type: system/metrics
use_output: default
meta:
package:
name: system
version: 1.20.4
data_stream:
namespace: production
package_policy_id: 39fc54d6-c45b-44d3-a63c-7585925e5005
streams:
- id: system/metrics-system.cpu-39fc54d6-c45b-44d3-a63c-7585925e5005
data_stream:
dataset: system.cpu
type: metrics
period: 10s
cpu.metrics:
- percentages
- normalized_percentages
metricsets:
- cpu
- id: system/metrics-system.diskio-39fc54d6-c45b-44d3-a63c-7585925e5005
data_stream:
dataset: system.diskio
type: metrics
period: 10s
diskio.include_devices: null
metricsets:
- diskio
- id: system/metrics-system.filesystem-39fc54d6-c45b-44d3-a63c-7585925e5005
data_stream:
dataset: system.filesystem
type: metrics
period: 1m
metricsets:
- filesystem
processors:
- drop_event.when.regexp:
system.filesystem.mount_point: ^/(sys|cgroup|proc|dev|etc|host|lib|snap)($|/)
- id: system/metrics-system.fsstat-39fc54d6-c45b-44d3-a63c-7585925e5005
data_stream:
dataset: system.fsstat
type: metrics
period: 1m
metricsets:
- fsstat
processors:
- drop_event.when.regexp:
system.fsstat.mount_point: ^/(sys|cgroup|proc|dev|etc|host|lib|snap)($|/)
- id: system/metrics-system.load-39fc54d6-c45b-44d3-a63c-7585925e5005
data_stream:
dataset: system.load
type: metrics
condition: '${host.platform} != ''windows'''
period: 10s
metricsets:
- load
- id: system/metrics-system.memory-39fc54d6-c45b-44d3-a63c-7585925e5005
data_stream:
dataset: system.memory
type: metrics
period: 10s
metricsets:
- memory
- id: system/metrics-system.network-39fc54d6-c45b-44d3-a63c-7585925e5005
data_stream:
dataset: system.network
type: metrics
period: 10s
network.interfaces: null
metricsets:
- network
- id: system/metrics-system.process-39fc54d6-c45b-44d3-a63c-7585925e5005
data_stream:
dataset: system.process
type: metrics
process.include_top_n.by_memory: 5
period: 10s
processes:
- .*
process.include_top_n.by_cpu: 5
process.cgroups.enabled: false
process.cmdline.cache.enabled: true
metricsets:
- process
process.include_cpu_ticks: false
- id: >-
system/metrics-system.process.summary-39fc54d6-c45b-44d3-a63c-7585925e5005
data_stream:
dataset: system.process.summary
type: metrics
period: 10s
metricsets:
- process_summary
- id: >-
system/metrics-system.socket_summary-39fc54d6-c45b-44d3-a63c-7585925e5005
data_stream:
dataset: system.socket_summary
type: metrics
period: 10s
metricsets:
- socket_summary
- id: system/metrics-system.uptime-39fc54d6-c45b-44d3-a63c-7585925e5005
data_stream:
dataset: system.uptime
type: metrics
period: 10s
metricsets:
- uptime
- id: logfile-logs-419b5495-812c-4115-bf78-19222d79c2e5
name: log-3
revision: 1
type: logfile
use_output: default
meta:
package:
name: log
version: 1.1.0
data_stream:
namespace: production
package_policy_id: 419b5495-812c-4115-bf78-19222d79c2e5
streams:
- id: logfile-log.log-419b5495-812c-4115-bf78-19222d79c2e5
data_stream:
dataset: generic
paths:
- /var/log/containers/app/elk.log*
processors:
- add_fields:
target: service
fields:
name: api
- add_docker_metadata: null
json.keys_under_root: true
json.message_key: message
json.add_error_key: true
- id: docker/metrics-docker-213815f0-a1f1-4caa-b9fa-e05c6622b52d
name: docker-3
revision: 1
type: docker/metrics
use_output: default
meta:
package:
name: docker
version: 2.2.0
data_stream:
namespace: production
package_policy_id: 213815f0-a1f1-4caa-b9fa-e05c6622b52d
streams:
- id: docker/metrics-docker.container-213815f0-a1f1-4caa-b9fa-e05c6622b52d
data_stream:
dataset: docker.container
type: metrics
metricsets:
- container
hosts:
- 'unix:///var/run/docker.sock'
period: 10s
labels.dedot: true
- id: docker/metrics-docker.cpu-213815f0-a1f1-4caa-b9fa-e05c6622b52d
data_stream:
dataset: docker.cpu
type: metrics
metricsets:
- cpu
hosts:
- 'unix:///var/run/docker.sock'
period: 10s
labels.dedot: true
- id: docker/metrics-docker.diskio-213815f0-a1f1-4caa-b9fa-e05c6622b52d
data_stream:
dataset: docker.diskio
type: metrics
metricsets:
- diskio
hosts:
- 'unix:///var/run/docker.sock'
period: 10s
labels.dedot: true
skip_major:
- 9
- 253
- id: docker/metrics-docker.event-213815f0-a1f1-4caa-b9fa-e05c6622b52d
data_stream:
dataset: docker.event
type: metrics
metricsets:
- event
hosts:
- 'unix:///var/run/docker.sock'
period: 10s
labels.dedot: true
- id: docker/metrics-docker.healthcheck-213815f0-a1f1-4caa-b9fa-e05c6622b52d
data_stream:
dataset: docker.healthcheck
type: metrics
metricsets:
- healthcheck
hosts:
- 'unix:///var/run/docker.sock'
period: 10s
labels.dedot: true
- id: docker/metrics-docker.info-213815f0-a1f1-4caa-b9fa-e05c6622b52d
data_stream:
dataset: docker.info
type: metrics
metricsets:
- info
hosts:
- 'unix:///var/run/docker.sock'
period: 10s
- id: docker/metrics-docker.memory-213815f0-a1f1-4caa-b9fa-e05c6622b52d
data_stream:
dataset: docker.memory
type: metrics
metricsets:
- memory
hosts:
- 'unix:///var/run/docker.sock'
period: 10s
labels.dedot: true
- id: docker/metrics-docker.network-213815f0-a1f1-4caa-b9fa-e05c6622b52d
data_stream:
dataset: docker.network
type: metrics
metricsets:
- network
hosts:
- 'unix:///var/run/docker.sock'
period: 10s
labels.dedot: true
- id: 0eaa298c-ee6c-4705-b54a-59c2663a58b8
name: apm-3 (prod)
revision: 2
type: apm
use_output: default
meta:
package:
name: apm
version: 8.6.0
data_stream:
namespace: production
package_policy_id: 0eaa298c-ee6c-4705-b54a-59c2663a58b8
apm-server:
capture_personal_data: true
max_connections: 0
max_event_size: 307200
auth:
api_key:
enabled: false
limit: 100
anonymous:
enabled: true
allow_agent: null
allow_service: null
rate_limit:
ip_limit: 1000
event_limit: 300
secret_token: null
default_service_environment: null
shutdown_timeout: 30s
sampling:
tail:
enabled: false
storage_limit: 3GB
policies:
- sample_rate: 0.1
interval: 1m
aggregation:
service:
enabled: false
rum:
enabled: true
exclude_from_grouping: ^/webpack
allow_headers: null
response_headers: null
library_pattern: node_modules|bower_components|~
allow_origins:
- '*'
source_mapping:
metadata: []
ssl:
enabled: false
key_passphrase: null
certificate: null
supported_protocols:
- TLSv1.1
- TLSv1.2
- TLSv1.3
curve_types: null
key: null
cipher_suites: null
response_headers: null
write_timeout: 30s
pprof.enabled: false
host: '0.0.0.0:8200'
max_header_size: 1048576
idle_timeout: 45s
expvar.enabled: false
read_timeout: 3600s
java_attacher:
enabled: false
discovery-rules: null
download-agent-version: null
agent_config: []
fleet:
hosts:
- 'https://failover.fleet.us-east-1.aws.found.io:443'
The configuration for the frontend is
revision: 17
outputs:
es-containerhost:
type: elasticsearch
hosts:
- 'http://552643323c9e4c06ba488dc604b2312a.containerhost:9244'
output_permissions:
es-containerhost:
_elastic_agent_monitoring:
indices: []
_elastic_agent_checks:
cluster:
- monitor
elastic-cloud-apm:
indices:
- names:
- logs-apm.app-default
privileges:
- auto_configure
- create_doc
- names:
- metrics-apm.app.*-default
privileges:
- auto_configure
- create_doc
- names:
- logs-apm.error-default
privileges:
- auto_configure
- create_doc
- names:
- metrics-apm.internal-default
privileges:
- auto_configure
- create_doc
- names:
- traces-apm.rum-default
privileges:
- auto_configure
- create_doc
- names:
- traces-apm.sampled-default
privileges:
- auto_configure
- create_doc
- maintenance
- monitor
- read
- names:
- traces-apm-default
privileges:
- auto_configure
- create_doc
agent:
download:
sourceURI: 'https://artifacts.elastic.co/downloads/'
monitoring:
enabled: false
logs: false
metrics: false
inputs:
- id: fleet-server-fleet_server-elastic-cloud-fleet-server
name: Fleet Server
revision: 1
type: fleet-server
use_output: es-containerhost
meta:
package:
name: fleet_server
version: 1.2.0
data_stream:
namespace: default
package_policy_id: elastic-cloud-fleet-server
server:
port: 8220
host: 0.0.0.0
server.runtime:
gc_percent: 20
- id: elastic-cloud-apm
name: Elastic APM
revision: 11
type: apm
use_output: es-containerhost
meta:
package:
name: apm
version: 8.6.0
data_stream:
namespace: default
package_policy_id: elastic-cloud-apm
apm-server:
capture_personal_data: true
max_connections: 0
max_event_size: 307200
auth:
api_key:
enabled: true
limit: 100
anonymous:
enabled: true
allow_agent:
- rum-js
- js-base
- iOS/swift
allow_service: null
rate_limit:
ip_limit: 1000
event_limit: 300
secret_token: Srh7Hl1oiTQO3izLVJ
default_service_environment: null
shutdown_timeout: 30s
sampling:
tail:
enabled: false
storage_limit: 3GB
policies:
- sample_rate: 0.1
interval: 1m
aggregation:
service:
enabled: false
rum:
enabled: true
exclude_from_grouping: ^/webpack
allow_headers:
- Content-Type
library_pattern: node_modules|bower_components|~
allow_origins:
- '*'
- app.800.com
- app.localhost.800.com
- 800.com
- 'app.localhost.800.com:3000'
- 800andre-app.ngrok.io
source_mapping:
metadata: []
ssl:
enabled: true
key_passphrase: null
certificate: /app/config/certs/node.crt
supported_protocols:
- TLSv1.1
- TLSv1.2
- TLSv1.3
curve_types: null
key: /app/config/certs/node.key
cipher_suites: null
response_headers: null
write_timeout: 30s
pprof.enabled: false
host: '0.0.0.0:8200'
max_header_size: 1048576
idle_timeout: 45s
expvar.enabled: false
read_timeout: 3600s
java_attacher:
enabled: false
discovery-rules: null
download-agent-version: null
agent_config: []
fleet:
hosts:
- 'https://failover.fleet.us-east-1.aws.found.io:443'
sqren
(Søren Louv Jansen)
March 13, 2023, 12:34pm
7
Thanks @Nacoma . I'll try to ask a colleague with more knowledge of the policy config for help and will get back to you.
Nacoma
March 13, 2023, 4:13pm
8
@sqren Thank you. It might be worthwhile for me to note, that when hovering the highlighted "Based on sample transactions" that the tooltip message is as follows:
This page is using transaction event data as no metric events were found in the current time range, or a filter has been applied based on fields that are not available in metric event documents.
This message is only shown when the "missing" transactions are shown. If I recall correctly, this message was always shown before adding the app
service's integration.
Nacoma
March 14, 2023, 9:56pm
9
Given the context provided by the notice, and the differences with what I'm seeing locally, it may be significant that are no events that match processor.event:"metric" and metricset.name:"transaction"
for any of the missing services. I'm still trying to determine why, but this feels important.
The agent is running the same software version, but there are zero transaction metric events in our cluster. I'm not sure what could cause this, or if those events are aggregated based on span_breakdowns during ingestion, or elsewhere.
sqren
(Søren Louv Jansen)
March 16, 2023, 11:23am
10
Nacoma:
Given the context provided by the notice, and the differences with what I'm seeing locally, it may be significant that are no events that match processor.event:"metric" and metricset.name:"transaction"
for any of the missing services. I'm still trying to determine why, but this feels important.
Yes, this was the problem I mentioned earlier.
Can you please check that your RUM service has enabled tracestate propogation (see the docs ). You can check this by looking at the headers on the outgoing requests from browser. You should see something like:
Access-Control-Request-Headers: traceparent, tracestate
Nacoma
March 16, 2023, 4:56pm
11
@sqren Our API server is configured to allow all headers, and distributed tracing works as expected.
Here we can see that the app
service has correlated the api
transaction, and that we see spans for database queries executed on the api
. However, those api
transactions are still not visible in the previously mentioned pages.
Traces that do not originate from the app
service are not visible either.
Both headers are present in the requests made from the browser:
Nacoma
March 17, 2023, 5:20pm
12
After a lot of digging, I narrowed it down to the fleet managed agent. Upgrading it from 8.5.1 -> 8.6.0
resolved the issue. Thank you for your help.
system
(system)
Closed
April 7, 2023, 1:21pm
13
This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.