Number of transactions per minute reported by APM is significantly different than Request Count in monitoring. Possible reasons for the discrepancy?

Kibana version: 7.1.1 oss (AWS Elasticsearch Service)

Elasticsearch version: 7.1.1 oss (AWS Elasticsearch Service)

APM Server version: 7.1.1 oss

APM Agent language and version: Ruby/Rails 7.1.1

Our application is a rails REST api app running in AWS elastic beanstalk, there are no static assets served, and no endpoints that don't ultimately call the rails app.

The application gets about 50k HTTPS requests per minute per the EB monitoring console.

In APM we're only seeing about 10k requests per minute ending up logged as apm transactions.

Is there anything obvious that I should be checking, or is there a reason why this kind of thing might be obviously different? I'm pretty new to this kind of stack, so I'm not sure what the discrepancy might be from.

EDIT: I have a load balancer pointed at multiple APM servers, and the agents are pointed at that load balancer. I found that if I scale up the number of apm servers, the number of transactions successfully reaching the ES server increases, though it's not quite linearly. The APM servers don't have very high CPU usage, there isn't a max number of simultaneous requests, and I can't think of any other bottleneck, but nevertheless, increasing the number of APM servers does seem to allow more transactions to successfully make it over. Does this ring a bell for anyone?

Hi Ryan!
I’m sorry to hear you’ve had trouble capturing the expected number of APM transactions.

What is your transaction_sample_rate setting? And the pool_size? The APM agent has a (or many, depending on pool_size) queue(s) to buffer the events being sent to the APM server. If that queue fills up, events will be dropped.

You can observe this behavior in the APM logs, as you’ll see warnings about the buffer being full. So I would suggest taking a look at those logs and comparing with the requests in your application logs.

Depending on what you find there, you can adjust the pool_size to increase the number of threads buffering events sent to the APM server and the transaction_sample_rate to ensure that you’re capturing the number of transactions you’d like.

Let us know if you’re able to resolve the discrepancy and if we can help any further!

Hey Emily, thanks for the response.

It appears that the agent isn't having trouble sending the data, because if I increase the number of load balanced instances running apm-server, the number of transactions making it across increases.

The agent itself is spread across 12 load balanced ruby/rails instances currently, so i think it doesn't overwhelm the pool size, etc.

Can you think of any reason the APM server(s) would be choking if they're overwhelmed, but the CPU usage would still be relatively low?

Hi Ryan, it could also be that the apm-server(s) are saturated and dropping events. I would recommend next inspecting the apm-server logs or the monitoring tab in Kibana. Here are some troubleshooting docs, if you'd like to check them out. https://www.elastic.co/guide/en/apm/server/7.4/troubleshooting.html

Thank you for the link, I'm trying some of what's on there and I'm seeing some success, though I will still probably need to increase the number of load balanced apm-server instances. Appreciate the help!

Glad to hear that!

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.