All spans grouped into "ServletWrappingController"

fuleow · October 30, 2020, 1:47am

We are instrumenting our Spring Boot services using the latest Elastic APM Agent and in Kibana the traces are all grouped by their parent spans. Unfortunately this makes almost all spans grouped under "ServletWrappingController" which is not very helpful. Is there a way to rename the parent span so this is more meaningful?

Some of our service are being instrumented using the OpenTelemetry agent and it allows the parent span to be renamed. This helps us group traces more logically based on the api and method being called.

The OpenTelemetry docs acknowledge and address this:

The state described above has one significant problem. Observability backends usually aggregate traces based on their root spans. This means that ALL traces from any application deployed to Servlet container will be grouped together. Because their root spans will all have the same named based on common entry point. In order to alleviate this problem, instrumentations for specific frameworks, such as Spring MVC here, update name of the span corresponding to the entry point. Each framework instrumentation can decide what is the best span name based on framework implementation details. Of course, still adhering to OpenTelemetry semantic conventions.

felixbarny · October 30, 2020, 7:28am

We pretty much do the same. For Spring MVC we also set the transaction name based on the MVC controller that handles the request. If it's an custom or unsupported framework, you can use the API to set the name of the transaction.
Which framework are you using? Possibly it's not much effort to add auto-instrumentation for it.

felixbarny · October 30, 2020, 7:39am

I think the issue is that we give precedence to Spring controllers/HandlerMethods as they are usually more descriptive than the DispatcherServlet that invokes them. But in this case, ServletWrappingController is invoking another servlet whose name is even more appropriate.

We could either have a special case for ServletWrappingController or, if you don't want any transactions named after Spring MVC controllers, you can also disable the spring-mvc instrumentation.

felixbarny · October 30, 2020, 11:24am

I've added support for ServletWrappingCrontroller: https://github.com/elastic/apm-agent-java/pull/1461

Could you try if the approach works for you? Here are the build artifacts of that PR:

fuleow · October 30, 2020, 6:31pm

Thanks for the quick reply and PR @felixbarny. These are jsonRPC calls being handled by our custom library so the meaningful values will be in the request body. However I think your change is still useful for other services.

We are actually using the opentracing-api instead of apm-agent-api directly since there are shims available from both Elastic APM and OpenTelemetry for the OpenTracing Tracer.

We are forced to use a mixed setup because we had scaling issues using the Elastic APM Java Agent on our very high rps services (40,000+ rps). Even with a low sample rate the high level transactions are still reported to the APM server and that slowed things down significantly (see https://github.com/elastic/apm/issues/104 and https://github.com/elastic/apm/issues/151).

Switching to using the OTEL or Jaeger agent and exporting to APM server's Jaeger endpoint works, but it is a hacky solution. I'm not sure if things have changed recently, but if it would be possible for the Java agent to only send sampled transaction information instead of all transactions that would make thing scale much easier.

felixbarny · October 30, 2020, 9:43pm

We'll add some experimental options to calculate metrics based on transactions in the upcoming 7.11 release. Be sure to try that out and give us feedback.

Eyal_Koren · November 1, 2020, 7:03am

Can you elaborate on that a bit? What has slowed down? Did you experience higher latencies in your application endpoints, or did you observe the effect only on the ingestion pipeline (agent -> APM Server - ES)? Did you try to see what happens with VERY slow sample rate (e.g. 0.0 - 0.001) to validate that the overhead is indeed related to ingestion and not related to the instrumentation/tracing overhead?

fuleow · November 2, 2020, 5:41pm

We had the service's sample rate set at 0.01% and there were no issues there. The slowdown was in the ingestion pipeline because billions of events were being created every day for the top level transaction information. Issue 151 and the subsequent discussions provide a lot of details https://github.com/elastic/apm/issues/151. It deals with the node agent but we saw the same behavior with Java.

The APM Server's UI shows the number of transactions in each latency bucket including ones which weren't sampled and it also gives overall latency numbers. We don't really need this information since we have other tools like Prometheus to capture histogram buckets of request latencies. A random sample will also approximate the correct distribution in APM without needing to record data from all transactions.

Eyal_Koren · November 3, 2020, 6:12am

Thanks for the details. It validates our efforts towards not sending unsampled transactions (relying on Elasticsearch's new histogram data type instead) and smarter, tail-based, sampling.
One thing I am still missing is whether or not you observed overhead in your application's endpoints latencies with the higher sampling rate, or any other noticeable overhead on CPU or memory (in the agent side).

fuleow · November 3, 2020, 7:05pm

I don't think there was any noticeable overhead on the application with sampling at under 1%. We did not attempt sampling at a higher rate because it would cause issues with ingestion.

Is this something that can be enabled on the agent right now?

Eyal_Koren · November 4, 2020, 3:53am

Not yet. It has multiple dependencies, but it is WIP.

felixbarny · November 4, 2020, 7:30am

Have you tried dropping non-sampled transactions with an APM Server processor in with an ingest node processor?

The RPM graph will be off but it might be a decent short-term solution for you.

system · November 25, 2020, 3:31am

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Spans outside request scope (distributed tracing / forkJoinpool) APM java	4	712	December 24, 2020
Logging integration with OpenTelemetry APM open-telemetry	6	704	December 14, 2023
Span breakdown metrics and service name auto-discovery (Time spent by span type is empty) APM java	5	657	November 26, 2021
OpenTelemetry traces are not properly shown on Kibana when using otel dotnet agent Elastic Observability	4	66	September 18, 2024
Elastic APM java not showings spans APM java	19	1146	January 12, 2023

All spans grouped into "ServletWrappingController"

Related topics