503 return codes in APM

mikemadden42 · March 28, 2019, 7:32pm

Description of the problem including expected versus actual behavior. Please include screenshots (if relevant):

We've started getting 503 return codes in our APM application. The application is written in Node.js. Here the error messages we're seeing.

Elastic APM HTTP error (503): queue is full: queue is full
Elastic APM HTTP error (503): timeout waiting to be processed

These errors occur about 20 times over the course of a day.

Kibana version: 6.6.1

Elasticsearch version: 6.6.1

APM Server version: 6.6.1

APM Agent language and version: Node.js, 2.7.0

Browser version: Chrome 73.x

Original install method (e.g. download page, yum, deb, from source, etc.) and version: YUM

Fresh install or upgraded from other version? Fresh install

mikemadden42 · March 28, 2019, 7:48pm

What's the best way to narrow down the source of the 503 return codes? I do not see the exact error message in the apm-server source.

apm-server(master): pt 'queue is full'
./beater/test_approved_stream_result/TestRequestIntegrationFullQueue.approved.json
5:            "message": "queue is full"

./beater/common_handler.go
129:			err:     errors.Wrap(err, "queue is full"),

./processor/stream/test_approved_stream_result/testIntegrationResultQueueFull.approved.json
5:            "message": "queue is full"

./docs/events-api.asciidoc
61:For example: queue is full, IP rate limit reached, wrong metadata, etc.
81:      "message": "queue is full" <3>

./publish/pub.go
58:	ErrFull          = errors.New("queue is full")
119:// Send tries to forward pendingReq to the publishers worker. If the queue is full,

./vendor/golang.org/x/sys/unix/zerrors_darwin_386.go
1743:	{106, "EQFULL", "interface output queue is full"},

./vendor/golang.org/x/sys/unix/zerrors_darwin_arm64.go
1743:	{106, "EQFULL", "interface output queue is full"},

./vendor/golang.org/x/sys/unix/zerrors_darwin_amd64.go
1743:	{106, "EQFULL", "interface output queue is full"},

./vendor/golang.org/x/sys/unix/zerrors_darwin_arm.go
1743:	{106, "EQFULL", "interface output queue is full"},

wa7son · March 28, 2019, 7:56pm

Hi @mikemadden42

It sounds like an error from the APM Server correct?

Have you looked at Common Problems section in the docs? There's a section about how to troubleshoot 503 errors.

Best,
Thomas

mikemadden42 · March 28, 2019, 8:03pm

Thanks @wa7son. I did look over the Common Problems document. The documents implies that if we only received 503 return codes, an Elasticsearch disk may be full. We are only receiving 503 returns when the error occurs, but I've verified we have plenty of space on our Elastic Stack cluster.

wa7son · March 28, 2019, 8:16pm

It sounds a little like the APM Server can't keep up with the amount of data that's being sent to it. Do you know approximately how many HTTP requests and events it's receiving?

If that's the issue, the recommended solution is to spin up multiple APM Servers behind a load balancer.

mikemadden42 · March 28, 2019, 8:25pm

Hi @wa7son, I'm leaning towards the same conclusion. Perhaps, our single APM server cannot keep up with the requests. In our production cluster, we've received right at 56 million events over the past 24 hours. Over the past week, we've received almost 240 million events.

mikemadden42 · March 28, 2019, 8:40pm

Do you think it's worth tuning our existing APM server? We've left it pretty default.

root@apmsrv ~]# egrep -v '^[[:blank:]]*#|^$' /etc/apm-server/apm-server.yml
apm-server:
  host: "apmsrv.somedomain.com:8200"
  frontend:
    enabled: false
  ssl.enabled: true
  ssl.certificate : "/etc/pki/tls/certs/apmsrv.crt"
  ssl.key : "/etc/pki/tls/private/apmsrv.key.pem"
setup.template.settings:
  index.number_of_shards: 2
  index.codec: best_compression
setup.kibana:
  host: "https://kibana.somedomain.com:5601"
output.elasticsearch:
  hosts: ["ingest01.somedomain.com:9200", "ingest02.somedomain.com:9200"]
  protocol: "https"
  username: "elastic"
  password: ${elastic_pass}
apm-server.rum.enabled: true
apm-server.rum.rate_limit: 10
apm-server.rum.allow_origins: ['*']
apm-server.rum.library_pattern: "node_modules|bower_components|~"
apm-server.rum.exclude_from_grouping: "^/webpack"
apm-server.rum.source_mapping.cache.expiration: 5m
apm-server.rum.source_mapping.index_pattern: "apm-*-sourcemap*"

gil · March 28, 2019, 10:20pm

The defaults are rather conservative. It's almost certainly worth tuning your APM server settings depending on what kind of hardware you are running APM server on.

You can directly increase the size of the queue that's being filled up using queue.mem.events. You'll also want to adjust your workers accordingly. That document also describes many other tunables you can adjust for your workload.

I'd also suggest enabling monitoring if possible to help guide some of these changes.

system · April 18, 2019, 6:20pm

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
APM 8.13.3: Error 503: Queue is full APM server	0	25	March 9, 2025
Fine tune APM server settings on the hosted solution (cloud.elastic.co) APM server	2	555	May 30, 2019
APM 7.6.2: Error 503: Queue is full but server is sleeping APM server	13	3071	June 16, 2020
APM Agent Queue is ful APM server	2	5230	September 9, 2019
Response Code 503 showing in application APM rum , server	5	1071	March 3, 2020

503 return codes in APM

Related topics