Failed to connect to backoff(elasticsearch(https://<es>)): Connection marked as failed because the onConnect callback failed: open /usr/share/apm-server/bin/ingest/pipeline/definition.json: no such file or directory

Hello,

We are running ELK stack v7.11.2 in 3 node cluster (RHEL7) and trying to implement APM server with Node JS agent.

APM server is installed in 2 nodes out of the 3 and each will point to all 3 ES nodes. Below is the setup:

Node1:

  1. Installed apm-server-7.11.2-1.x86_64
  2. apm-server.yml:
apm-server:
  host: "<ip>:8200"
  ssl:
    enabled: true
    certificate: /etc/elasticsearch/secure/ES-SIT-NODE-1/ELK-DEV.cer
    key: /etc/elasticsearch/secure/ES-SIT-NODE-1/ELK-DEV.key

output.elasticsearch:
  hosts: ["<host>"]
  username: elastic
  password: elastic
  ssl.verification_mode: none

logging.level: debug
logging.to_files: true
logging.files:
  path: /etc/apm-server/logs
  name: apm-server
  keepfiles: 7
  permissions: 0644

APM metrics received through this node are being logged in elasticsearch. Excerpt of apm-server log file:

{"log.level":"debug","@timestamp":"2022-10-03T15:56:56.775+0100","log.logger":"elasticsearch","log.origin":{"file.name":"elasticsearch/client.go","file.line":230},"message":"PublishEvents: 1 events have been published to elasticsearch in 13.53063ms.","ecs.version":"1.6.0"}

Node 2:
installed RPM: apm-server-7.11.2-1.x86_64
However, when the agent is pointed to node2, nothing is logged in ES. The below error is logged:

{"log.level":"error","@timestamp":"2022-10-03T16:37:14.576+0100","log.logger":"publisher_pipeline_output","log.origin":{"file.name":"pipeline/output.go","file.line":154},"message":"Failed to connect to backoff(elasticsearch(<es>)): Connection marked as failed because the onConnect callback failed: open /usr/share/apm-server/bin/ingest/pipeline/definition.json: no such file or directory","ecs.version":"1.6.0"}

The file /usr/share/apm-server/bin/ingest/pipeline/definition.json does not exist on either of the two nodes.

Node JS Agent:

// APM agent --------------------------------------------------
var apm = require('elastic-apm-node').start({
  serviceName: 'kafka-ui-dxm-app',  
  serverUrl: '<es>:8200',
  // verifyServerCert: false,
  serverCaCertFile: './ssl/AWS-ES-CLUSTER-CA.pem',
  environment: 'elk-dev'
});

Can you please advise what might be going wrong

Thanks

Hi @preetish_P,

It looks like the APM Server on the 2nd node is unable to access its pipeline definition, which seems strange, however, this is an old version that I'm a bit unfamiliar with and may have a bug?

  1. Could you make sure that you configuration file on the 2nd node is exactly the same as the configuration file on your first node?
  2. If both files are the same or the error persists, you can disable the creation of ingest pipeline on the 2nd node (General configuration options | APM User Guide [7.16] | Elastic). I highly recommend that at least one of the APM Servers has that option set to true (default), otherwise APM Server may not work as expected.

7.11.x is quite an old version to be running. Is there any chance you can upgrade your stack to run more up to date versions of the 7.x branch? 7.17.x is going to be supported for a while, so you may want to use that instead.

Hi @marclop

Thanks for your reply. Adding the below in apm-server.yml resolved the issue:

apm-server.register.ingest.pipeline.enabled: false

Can both the nodes have this setting once APM is 'bootstrapped' i.e. apm* indices are created?

Thanks

They could, but if you upgrade the APM Server version and forget to set that setting back to true, then you will be running with outdated ingest pipelines.

I would recommend having at least 1 APM Server with that setting set to true.

1 Like

This topic was automatically closed 20 days after the last reply. New replies are no longer allowed.