Sampled transactions are not found in ElasticAPM UI or Elastic index

We have the following setup in Scala:

val startedTransactions = new AtomicReference(0)
val finishedTransactions = new AtomicReference(0)

val apmTransactions = TrieMap.empty[String, Transaction]

Then we start a stream using Akka based on incoming Kafka messages (pseudo-code):

stream.foreach { message =>
    val tx = ElasticApm
              .currentTransaction()
              .setType("message")
    startedTransactions.getAndUpdate(i => i + 1)
    apmTransactions.put(tx.getId, tx)

At the end of the stream:

.runWith(
        Sink.foreach { _ =>
          val tx = ElasticApm.currentTransaction()
          apmTransactions.remove(tx.getId)
          tx.`end`()
          finishedTransactions.getAndUpdate(i => i + 1)
     }
)

Results:

  • We correctly see all 3469 transactions initially in the Map[String, Transaction] -> equals to the number of Kafka messages we send
  • We see most of the transactions at the end and being closed (around 150 are missing -> problems for later, not a priority now)
  • But we see only 2184 transactions in the ElasticAPM index itself.

This means that even though we are closing the transactions manually and see 3300 transactions at the end of our stream, we are still missing over 1000 transactions in the index itself.

We use the latest version of the agent 0.18.1 and the latest elastic-stack 7.9.2 with Java11.

Hi @milanvdm,

I suggest you to run the agent with log_level=debug and do the following:

  • check if there is any error
  • check if the agent circuit breaker is triggered (should be disabled by default)
  • check if there are dropped transactions, which could happen if internal buffers are full, in which case transactions would be captured but not reported to the apm-server.
  • count the number of occurrences for terms startTransaction and endTransaction, those should be equal to the expected number of transactions, if that's not the case, it means transaction aren't properly captured.

Also, because the log is quite verbose, I would suggest to try this with a lower number of transactions first, then try to increase until you start to see errors/inconsistencies.

Last but not least, the difference of ~150 transactions could probably be explained by errors, and if that's the case keeping them in a collection like a map will create a memory leak as those references will prevent those transaction objects from being garbage-collected. For similar purpose in the agent we use weak maps that don't create memory leaks.

Hi @Sylvain_Juge,

In the end, it was just a tuning problem with our APM server. We increased the queue size and max_bulk, and now all transactions appear correctly.

Strange this is that we didn't have warning logs on the agent's side that says our queue is full, but maybe the apm-server was not even responding properly due to load-issues.

Cheers,
Milan