APM Server transport error (503): Unexpected APM Server response

If you are asking about a problem you are experiencing, please use the following template, as it will help us help you. If you have a different problem, please delete all of this text :slight_smile:

Kibana version: 7.0.0

Elasticsearch version: 7.0.0

APM Server version: 7.0.0

APM Agent language and version: Node.js

Browser version:

Original install method (e.g. download page, yum, deb, from source, etc.) and version: yum

**Fresh install or upgraded from other version?**Fresh install

Is there anything special in your setup?

Description of the problem including expected versus actual behavior. Please include screenshots (if relevant):

Steps to reproduce:

  1. stop the apm server
  2. seeing error in the application log : "APM Server transport error (503): Unexpected APM Server response"
  3. restart the apm server
  4. our application is running un-interrupted but still seeing "APM Server transport error (503): Unexpected APM Server response"

Errors in browser console (if relevant):

Provide logs and/or server output (if relevant):
"APM Server transport error (503): Unexpected APM Server response"

We wanted to test the robustness of our app and apm infrastructure, so we deliberately stop the apm server and restart it. The issue we are observing is that the agent can't reconnect to the server. It keeps spewing ""APM Server transport error (503): Unexpected APM Server response"".

Is there a workaround for this behavior?

Thank you.

To put it another way, We want the code inside the app to automatically re-connect to the APM Server, but it does not seem to be able to. Known issue?

@meiyuan @npilchbluescape I can share with you some ideas that could be worth exploring.

  1. You could enable debug logging in your APM Server and check for further errors. This would help you to rule out any other issues (refer to some of the 503 errors in our documentation). You can also enable debug logging in the APM Node.js Agent as well and check for further errors.

  2. It could be possible that the APM Node.js Agent is still using the same HTTP request / TCP socket (the one initially created when the APM server was up) when you are restarting the APM Server. Tuning the values of serverTimeout and apiRequestTime might help?

  3. It would be worth checking with the latest versions of APM Server and APM Node.js Agent is you are experiencing the same behaviour.

I hope that helps.

Hi Romain,
Thank you for your quick response. I tried what you have suggested, but I couldn't find anything useful yet.

I enabled the logging in the APM server with command apm-server -e -d "publish", but nothing unordinary shows up, it only shows the start-up log. After that, whatever I do with the agent, no logs were produced. You can see the log here: https://pastebin.com/9pFKdW3M

I also enabled the logging in the agent part and set it to debug. However, I couldn't find anything useful to investigate further. https://pastebin.com/9J59Czun

We didn't change the default value for these two. Do you have any suggestion?

Thank you again.
Meiyuan

@meiyuan Thank you for the feedback.

  1. Which version of the APM Node.js Agent you are using? Can you run the following command and send the results to us: npm info elastic-apm-node

  2. Which version of Node.js are you using?

  3. Are you by any chance using the connect module? If you could share a sample of your application having the issue, that would be great.

Thank you.

Hi Romain,
Thank you for your help.

elastic-apm-node@3.3.0 | BSD-2-Clause | deps: 29 | versions: 86
The official Elastic APM agent for Node.js
https://github.com/elastic/apm-agent-nodejs

keywords: opbeat, elastic, elasticapm, elasticsearch, log, logging, bug, bugs, error, errors, exception, exceptions, catch, monitor, monitoring, alert, alerts, performance, apm, ops, devops, stacktrace, trace, tracing, distributedtracing, distributed-tracing

dist
.tarball: https://registry.npmjs.org/elastic-apm-node/-/elastic-apm-node-3.3.0.tgz
.shasum: c29fb42ab0bf2d2d7292ab2b650cfbe65b0ccbb6
.integrity: sha512-rJNv/NSzJUaGfFkvHXRaNrFKTPCM4h3xt1OmHBq+foMqSnRBFU0o41HCkNtXw9JWKkMblxIknC3EHkwJ8yWiDA==
.unpackedSize: 251.5 kB

dependencies:
after-all-results: ^2.0.0        elastic-apm-http-client: ^9.3.0  measured-reporting: ^1.51.1      redact-secrets: ^1.0.0
async-value-promise: ^1.1.1      end-of-stream: ^1.4.4            monitor-event-loop-delay: ^1.0.0 relative-microtime: ^2.0.0
basic-auth: ^2.0.1               fast-safe-stringify: ^2.0.7      object-filter-sequence: ^1.0.0   require-ancestors: ^1.0.0
console-log-level: ^1.4.1        http-headers: ^3.0.2             object-identity-map: ^1.0.2      require-in-the-middle: ^5.0.2
cookie: ^0.4.0                   http-request-to-url: ^1.0.0      original-url: ^1.2.3             semver: ^6.3.0
core-util-is: ^1.0.2             is-native: ^1.0.1                read-pkg-up: ^7.0.0              set-cookie-serde: ^1.0.0
(...and 5 more.)

maintainers:
- qard <admin@stephenbelanger.com>
- watson <w@tson.dk>

dist-tags:
false: 2.17.2  latest: 3.3.0

published a month ago by axw <axwalk@gmail.com>
sh-4.2$ node -v
v10.15.3

Unfortunately, no, we don't use this module.

Here is how I set up the agent with the debug node on:

if (config.get('elastic_apm.serviceEnabled')) {
  // Add this to the VERY top of the first file loaded in your app
  var apm = require('elastic-apm-node').start({
    // Override service name from package.json
    // Allowed characters: a-z, A-Z, 0-9, -, _, and space
    serviceName: config.get('elastic_apm.serviceName'),
    // Set custom APM Server URL (default: http://localhost:8200)
    serverUrl: `${config.get('elastic_apm.serverUrl')}:${config.get('elastic_apm.serverPort')}`,
    logLevel: config.get('elastic_apm.logLevel'),
  })
}

And this is the config file where we load the ENV

elastic_apm: {
    serviceName: {
      format: 'String',
      default: '',
      env: 'ELASTIC_APM_SERVICE_NAME',
    },
    serverUrl: {
      format: 'String',
      default: '',
      env: 'ELASTIC_APM_URL',
    },
    serverPort: {
      format: 'String',
      default: '',
      env: 'ELASTIC_APM_PORT',
    },
    serviceEnabled: {
      format: 'Boolean',
      default: false,
      env: 'ELASTIC_APM_SERVICE_ENABLED',
    },
    logLevel: {
      format: 'String',
      default: 'debug',
      env: 'ELASTIC_APM_LOG_LEVEL',
    },
  },```

@meiyuan - Thank you for your patience. I tried to reproduce in the lab - without any success. I also discussed with the team.

  1. We think it would be good to test this scenario with the latest APM server and check if the same is encountered. You may want to consider also enabling the Stack Monitoring and check if anything unusual stands-out.

  2. Another point (out of curiosity): are you using a proxy in your environment?

Thank you Romain, I will try what you have suggested.
To your 2nd question, this is how we setup the agent and server.

  1. Agent resides in apps. These apps are containerized in a Kuberbetes/openshift cluster.
  2. Two separate servers are dedicated for APM to run. And they are not containerized. We used the RPM to install the server. And there is an aws ELB in front of these two servers.
  3. There is VPC peering between these two clusters.

I hope that helps.
Meiyuan

1 Like