Automatic transaction time not aligned with real execution time

Kibana version: 7.6.2

Elasticsearch version: 7.6.2

APM Server version: 7.9.0

APM Agent language and version: PHP commit 8eccf0b4657533d1848d5541a250096d2f436482

Browser version: N/A

PHP version: 7.2.24-0ubuntu0.18.04.6

Original install method (e.g. download page, yum, deb, from source, etc.) and version: From source

Fresh install or upgraded from other version? Fresh install

Is there anything special in your setup? For example, are you using the Logstash or Kafka outputs? Are you using a load balancer in front of the APM Servers? Have you changed index pattern, generated custom templates, changed agent configuration etc.

Description of the problem including expected versus actual behavior. Please include screenshots (if relevant): I expect transactions created by extension should cover the entire execution. However, the transaction ends within a few ms while the API actually takes a few hundreds of ms. I tried to create a span inside a laravel middleware (using captureChildSpan public API) and its time span makes much more sense than the transaction itself.

Steps to reproduce:

  1. Clone PHP agent from the source. Build, install, config elastic_apm.ini.
  2. Enable config for php-fpm and restart php-fpm service.
  3. Observe results in Kibana.

Errors in browser console (if relevant): N/A

Provide logs and/or server output (if relevant): N/A Let me know if anything is needed.

I am new to Elastic APM and APM in general. Please correct me if I am wrong.

Is the current behavior the expected behavior? Is it due to the fact that the agent is still under development (I don't see an issue related to this opened though)?

Not sure what's the expected behavior so I decided to open a topic in discuss first.

Hi @zinsserzh and welcome to the forum!

One of the main purposes of APM is to let application developer to find (and hopefully fix :)) root causes for latency experienced by application users. In case of a PHP web application that latency might include sources that cannot be detected and measured from inside the application. For example, the time it takes for all the extensions to execute initialize-for-each-request stage (technically it's called RINIT), time it takes PHP engine to load and process the source (or fetch opcode from the cache), etc. So if you want to investigate correctness/suitability of an APM solution I would recommend comparing times reported by APM against the times from the end user perspective. For example you can use browser's development tools to get request duration.

As for your particular case we cannot blame things such extensions initialization, etc. for the latency since the graph clearly shows that 75% of the latency was spent in an outgoing HTTP call. It's incorrectly named curl_exec - I've opened issue #151 to have the agent follow the current Elastic APM spec for the agents. Knowing to where this HTTP call is going should give quite a good clue as why your transaction spends so much time on it. I will try to give issue #151 high priority but in the meantime I created a quick temporary solution that you can try out (PR #152) - it adds URL label to curl_exec spans.

If you are interested in implementation internals, the way Elastic APM PHP Agent measures transaction duration from the point of view of PHP extension (you can see the details in "Learning the PHP lifecycle" section of "PHP Internals Book") from RINIT until RSHUTDOWN.

Hi @Sergey_Kleyman

Thanks for the reply. Appreciate your effort to open an issue for curl tracking.

Is the transaction only meant to capture extension initialization time? Or the entire request processing time?

My original question was about the time elapsed for the entire transaction. In the diagram, laravel is a span captured by PHP code via public API captureChildSpan. It is NOT the transaction automatically captured by APM agent.

The transaction automatically captured by the agent only ran for 2688us, which is clearly not the actual time spent. The second curl was accessing another micro-service which I intentionally added some delay to illustrate timing. Client side in this case reports ~400ms which is comparable time to the laravel span, but not the transaction time (2688us).

I am not blaming extension initialization. I am saying the time reported by the agent is far from reasonable.

In the link you provided:

To give you an idea, RSHUTDOWN() is called:

  • After userland shutdown functions have been executed ( register_shutdown_function() )
  • After every object destructor have been called
  • After PHP output buffer has been flushed
  • After max_execution_time has been disabled

However, seems that the transaction ended even before the userland code started. Either RSHUTDOWN was called before userland code started (is that possible?) or the transaction was stopped early by some other code. This is reproducible on my setup. Let me know if you want me to dump any debug log to help investigate.

I'm sorry I completely misread your initial question - now I understand the issue.
The answer to your question:

Is the transaction only meant to capture extension initialization time? Or the entire request processing time?

is that the automatically created transaction is meant to capture the entire request processing. Of course if developer creates transactions and spans using agent's public API then she is free to do anything she wants and transaction can be ended well before the child span(s). But in this case the transaction in question was created automatically by the agent so this should not happen. I think it's the most likely a bug - I'll try to reproduce your use case locally to verify. I've created issue #153 to investigate and fix this bug.

1 Like

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.