50 percent depreciation in API performace while Using APM agent

Akash_Patel · January 31, 2019, 9:25am

Hi,

We are recently in a process of adding Elastic APM. But after Performance testing of API with APM agent throughput of API is reduced by 50%.

Throughput before adding APM agent - 15200/sec
Throughput after adding APM agent - 8100/sec

APM-Agent configuration :

  logLevel: 'info',
  serverTimeout: "10s",
  captureExceptions: true,
  sourceLinesErrorAppFrames: 5,
  sourceLinesErrorLibraryFrames: 0,
  captureErrorLogStackTraces: "messages",
  captureSpanStackTraces: false,
  stackTraceLimit: 15,
  transactionSampleRate:1,
  captureBody:false,
  instrument:true,
  disableInstrumentations:["redis","mysql"],
  transactionMaxSpans:50,
  apiRequestTime:"10s",
  apiRequestSize:"750kb"

We tried using different configurations but there wasn't much improvement in throughput.

Tried lowering transactionSampleRate , apiRequestTime and apiRequestSize but insignificant improvement.

Do we need to tune apm-server configurations?
Is there anything else need to be tuned in APM agent?

Please let me know if you need more info on this.

Kibana version: 6.5.4

Elasticsearch version: 6.5.4

APM Server version: 6.5.4

APM Agent language and version: Node js, 2.1.0

Thanks

digitalron · January 31, 2019, 1:17pm

Have you tried scaling out Elastic APM? Intake API bandwidth can be a problem given the quantities above. Assuming a 10% sampling rate, that is still about 800 transactions you are sending per second, easily exceeding 800 kB.

Also, do you see CPU or memory-bound bottlenecks on the instrumented services? You might want to play around with those settings.

Akash_Patel · February 1, 2019, 7:29am

Hi,

Thanks for your time

For now we are running APM server on single machine with 40 cores, 512GB of ram with only single instance of Apm server.
I think you are talking about this parameter right ?

You said scaling out so should i try multiple instances of apm server on same machine? or each instance running on different machine?

I thought of running multiple instances of apm server on same machine and load balancing them with NGINX. will this help?

Thanks

digitalron · February 1, 2019, 9:18am

You can try multiple instances of the APM server on the same machine and load balance them in front. I've done it with Haproxy in front of three dockerised APM Servers on a single server and it improved out APM intake throughput. It also helped that we set up a cluster of Elasticsearch stores: 3 Master, 4 Data. Keep monitoring the resources and see if you are CPU or memory or network bound. In our case, we did a bit of custom ingest + Geo and Useragent so CPU use was a bit high.

Akash_Patel · February 1, 2019, 11:54am

Thanks for your guidance. Will try the same, and also monitor CPU+Mem usage.

Meanwhile,

Can you provide me some suggestions to begin with the scaling ?
Assuming that machine has 40 core and 512GB of RAM.

How many instances of APM server should i run to begin with?
What should be the values of following properties in YAML file for each instance to begin with?
queue.mem.events, output.elastic.bulk_max_size , output.elastic.workers
Do i need to change any other property in YAML file in order to optimize apm server ?

Thanks

digitalron · February 1, 2019, 1:42pm

To be honest, the quantity would really depend on the amount and type of instrumentation you are doing. Trial and error is the way to go. Start small, run the load test, then observe CPU and RAM along with I/O.

On my laptop (32 GB RAM MacBook 2018), I have 3 APM Servers running with 2 GB heap each and a 7-node Elasticsearch cluster. I can easily ingest a combined 4000 calls per minute of a fairly complex transaction (distributed across six Java services) at 100% sampling rate.

Akash_Patel · February 1, 2019, 3:57pm

Sure. Will try it out.

Thanks

stephenbelanger · February 1, 2019, 6:28pm

I would turn the sample rate way down. A good balance is aiming for 50-100 traces per second. Setting sourceLinesErrorAppFrames to 0 and captureErrorLogStackTraces to 'never' might also help.

Akash_Patel · February 2, 2019, 6:32am

Sure will try this out and let you know. Thanks

felixbarny · February 19, 2019, 8:29am

Did you try out the suggestions? Did it help to retain the throughput?

system · March 12, 2019, 4:29am

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Significant drop in reported throughput after upgrading from 7.15 through to 8.1.2 APM nodejs , server	2	457	May 11, 2022
Java APM agent sample rate ignored APM java	7	706	August 17, 2020
APM Lower Metrics Count than traces APM nodejs	1	32	October 29, 2024
No sample available for this bucket APM APM nodejs , server	3	1085	August 25, 2020
Huge portion of transactions missing from Java agents after upgrade from 7 to 8 APM java , server	10	114	November 29, 2024

50 percent depreciation in API performace while Using APM agent

Related topics