APM on Webflux not registering webfliter code

Kibana version: 7.17

Elasticsearch version: 7.17

APM Server version:7.17

APM Agent language and version: java 1.49.0

Browser version: Chrome

Original install method (e.g. download page, yum, deb, from source, etc.) and version: programatic attach

Fresh install or upgraded from other version?

Is there anything special in your setup? For example, are you using the Logstash or Kafka outputs? Are you using a load balancer in front of the APM Servers? Have you changed index pattern, generated custom templates, changed agent configuration etc.

Description of the problem including expected versus actual behavior. Please include screenshots (if relevant):
When using webflux, the code in a WebFilter is not registered in the traces.
We do some requests in the webfilter, and they are not being registered.
Heres how its showing:
image

Using the tracing api, ElasticApm.currentTransaction() returns a noopTransaction, so theres no transaction at that point, it starts later.

I tried starting a custom transaction that spans the whole filter (including the eventual controller call) but then no spans are registered except for the custom span thats on the screenshot. The HTTP requests in the webfliter are not registered


As you can see here, the whole requests is much longer than the 2s shown on the original trace

This works fine on spring webmvc projects

Steps to reproduce:

  1. Create a webflux application with a webfilter
  2. Check for currentTransaction in the filter

Hi,

I think that one easy thing to try would be to test the same scenario with OpenTelemetry agent, would it be something that you could try here ?

If the Otel agent supports it, it could provide an easy work-around and maybe long-term solution. If it does not, then it means we'll have to investigate further why this corner case is not properly covered.

Thanks for your response

If I understand correctly from this document, switching the agent requires changes in the elastic infrastructure. Sadly thats not a viable option for our company at this point.

Do you think another workaround could be possible? Why could it be that even starting a new transaction on the WebFilter does not register the http requests made in the same webfilter? Shouldnt the later transaction on the controller start with the previous one as parent?

Well, if using an OpenTelemetry agent is not possible, then it means the only solution is to identify and fix the issue in the agent. Do you think you could provide a very simple application that helps reproduce the issue here ?

Also, it wasn't very clear but here I was only suggesting to try to see if the same behavior was also observed with the OTel agent, not asking you to switch everything to it. So, if you have a out-of-production environment to test that would be useful as well.

Last but not least, we need to know if it's something that is only happening with our APM agent or if it also applies to opentelemetry agent as our long term strategy will also rely on it, that means if the issue also happen with otel we should probably contribute a bugfix upstream too.

Here's a minimal reproduction example

There's a controller and a webfilter. Both make a request and this is the result on APM:

Auto generated transaction from the controller showing only the controller request:

Custom transaction starting on the filter and finishing after the controller returns. It does not register any of the requests:


Regarding the OTel test, I havent used it, I will investigate if theres an easy way to see traces without elastic

I managed to deploy Uptrace locally to test the opentelemetry integration, and it seems to be working as expected:

Heres the code:

I didn't understand why you can't use the otel agent? it's not a change to elastic infrastructure, only the app startup and environment

We are currently on elastic 7.17
When trying to export data to the apm server we get a 404 error:

WARN OkHttp http://apm-server.com/... i.o.e.internal.http.HttpExporter - Failed to export logs. Server responded with HTTP status code 404. Error message: Unable to parse response body, HTTP status message: Not Found

It looks like the request is being made to http://apm-server.com/v1/logs which does not exist on this version. In the 8.13 version it looks like these endpoints do exist and are mentioned here

If we try to point it to /intake/v2/events (mentioned here) we get 400 bad request response.

Server responded with HTTP status code 400. Error message: Unable to parse response body, HTTP status message: Bad Request

Do you have some suggestion or documentation to make it work with 7.17?

Thanks

I found that otlp/http protocol is not supported in 7.17, so the only option is grpc, which requires HTTPS.
Sadly we are not using HTTPS for connection to the apm server so I dont see how to make it work without touching infrastructure