IOException from apm server

Kibana version: 7.8.0
Elasticsearch version: 7.8.0
APM Server version: 7.8.0
APM Agent language and version: Java, 1.17.0
Fresh install or upgraded from other version?: Fresh

I am trying elastic APM on my dev set up with docker containers using docker-for-windows. I was able to quickly setup the elasticsearch, kibana and apm-server using the docker compose from https://www.elastic.co/guide/en/apm/get-started/current/quick-start-overview.html. Modified the default apm-server.yml with these values (read_timeout: 15s, write_timeout: 15s and max_event_size: 1024000 (bytes). The docker containers came up fine. I’m trying this on a Java application (that uses jdk8.0.252 linux_x64), which also runs in a docker container. I also downloaded the apm-agent and have it mounted in the app container folder, “elastic-apm-agent-1.17.0.jar” (from maven). I started the application using these settings (-Delastic.apm.config_file=//elasticapm.properties)

    recording=true
    instrument=true
    service_name=ma
    hostname=ma
    environment=docker-dev
    transaction_max_spans=1000
    sanitize_field_names=<headers-to-exclude>
    ignore_urls=<urls-to-skip-like-healthy>
    server_urls=http://host.docker.internal:8200
    server_timeout=15s
    max_queue_size=2000
    api_request_time=15s
    api_request_size=1mb
    metrics_interval=15s
    application_packages=<app-java-packages>
    stack_trace_limit=100
    log_level=DEBUG

However, I frequently see the IOException from the server (below). I read https://www.elastic.co/guide/en/apm/server/current/common-problems.html#io-timeout . Can the timeouts be same or do they have to be incremental? or is this a different issue?

    2020-07-18 16:44:28,492 [elastic-apm-server-reporter] DEBUG co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Sending payload to APM server failed
    java.io.IOException: Server returned HTTP response code: 400 for URL: http://host.docker.internal:8200/intake/v2/events
    	at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1900) ~[?:1.8.0_252]
    	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498) ~[?:1.8.0_252]
    	at co.elastic.apm.agent.report.AbstractIntakeApiHandler.endRequest(AbstractIntakeApiHandler.java:129) [?:?]
    	at co.elastic.apm.agent.report.IntakeV2ReportingEventHandler.endRequest(IntakeV2ReportingEventHandler.java:162) [?:?]
    	at co.elastic.apm.agent.report.IntakeV2ReportingEventHandler.handleEvent(IntakeV2ReportingEventHandler.java:85) [?:?]
    	at co.elastic.apm.agent.report.IntakeV2ReportingEventHandler.onEvent(IntakeV2ReportingEventHandler.java:73) [?:?]
    	at co.elastic.apm.agent.report.IntakeV2ReportingEventHandler.onEvent(IntakeV2ReportingEventHandler.java:44) [?:?]
    	at co.elastic.apm.agent.shaded.lmax.disruptor.BatchEventProcessor.processEvents(BatchEventProcessor.java:168) [?:?]
    	at co.elastic.apm.agent.shaded.lmax.disruptor.BatchEventProcessor.run(BatchEventProcessor.java:125) [?:?]
    	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_252]
    2020-07-18 16:44:28,495 [elastic-apm-server-reporter] WARN  co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - {
      "accepted": 0,
      "errors": [
        {
          "message": "read tcp 172.20.0.4:8200-\u003e172.20.0.1:41842: i/o timeout
        }
      ]
    }

Consistently see the above exception when server_timeout=30s and api_request_time=30s

If you change api_request_time in the agent, you need to ensure that the server's read_timeout and write_timeout are both greater by a significant amount. By default api_request_time is 10 seconds, and read_timeout and write_timeout are both 30 seconds.

The read_timeout and write_timeout config control how long the server will allow for reading the request body, and writing the response. On the other hand, api_request_time controls how long the request body is kept active for streaming events. If the agent keeps the request body open for longer than the server allows, it will lead to this error.

Why are you setting these config values? Typically you shouldn't need to change them, so unless you do I would recommend leaving them unset so that they use the default values.

Thanks for the response, and good to know. Since the document about these properties are kind of one liner (https://www.elastic.co/guide/en/apm/server/current/configuration-process.html), wanted to understand better the correlation b/w these properties and their significance. Specifically, for certain services / apps (legacy) where the # of tracing spans could be > 2k with quite a few remote service invocations and DB hits, and also when the request is long running for bulk / batch processing.

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.