APM queue is full (output Kafka)

Hi,

What can I do if the APM server (7.9.2) continuously logs 503 errors (queue is full)? I checked the tuning options but for many reasons my output isn't Elasticsearch but Kafka and I couldn't find any pointer that would help pushing more data to my Kafka cluster.

Thanks!

Hi @YvorL,
similar to the config options for Elasticsearch there is a bulk_max_size and a bulk_flush_frequency option available for Kafka. The config options for adjusting the APM Server internal queue are independent of the actual output.
Seeing 503 on a regular basis might just be an indicator though that your APM Server cannot keep up with the load and you might actually need to add resources to the server.

Thanks! I've added a new node and all 503 errors disappeared. Now I see a couple of "too large body" (400) entries per minute. Sometimes paired with connection reset.

ERROR   [request]       middleware/log_middleware.go:95 request body too large  {"request_id": "---", "method": "POST", "URL": "---", "content_length": 813677, "remote_address": "---", "user-agent": "---", "response_code": 400, "error": "event exceeded the permitted size."}
ERROR   [request]       middleware/log_middleware.go:95 request body too large  {"request_id": "---", "method": "POST", "URL": "---", "content_length": 2021384, "remote_address": "---", "user-agent": "---", "response_code": 400, "error": "event exceeded the permitted size., event exceeded the permitted size., event exceeded the permitted size., event exceeded the permitted size."}
ERROR   [request]       request/context.go:173  write response  {"request_id": "---", "method": "POST", "URL": "---", "content_length": 2021384, "remote_address": "---", "user-agent": "---", "error": "write tcp XYZ:X->XYZ:X: write: connection reset by peer"}

Could you help me out how to solve this error?

Thanks!

The APM Server has an apm-server.max_event_size config option that defaults to 307200 bytes, which you could increase. Usually I would not expect APM events to exceed this size though. Could you provide information about which APM agent you are using that leads to these errors, and also agent side custom configuration?

1 Like

It's a custom agent so that's on me, also I overlooked that setting in the docs too.

Thanks again, you helped a lot!

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.