Heartbeat is crashing due to High Availability Cluster Communication

(Yogesh) #1

For confirmed bugs, please report:

  • Version: heartbeat 5.4.3, elasticsearch 4 node cluster with version 6.3.2
  • Operating System: heartbeat installed on ubuntu 18.04, elasticsearch cluster centos 7
  • Discuss Forum URL:

heartbeat in runing in ubuntu machine to check the ip status using following configuration

Configure monitors

heartbeat.monitors:

  • type: icmp

Configure task schedule

schedule: '*/10 * * * *'
hosts: ["192.168.11.76"]

pushing data to elastic search , it crashes after 2 days on continues run of heartbeat service

  • Steps to Reproduce:
    systemctl status heartbeat

● heartbeat.service - Heartbeat High Availability Cluster Communication and Membership
Loaded: loaded (/lib/systemd/system/heartbeat.service; enabled; vendor preset: enabled)
Active: failed (Result: exit-code) since Thu 2019-05-16 10:15:10 IST; 1h 53min ago
Process: 929 ExecStart=/usr/lib/heartbeat/heartbeat -f (code=exited, status=6)
Main PID: 929 (code=exited, status=6)

May 16 10:15:10 adumaster systemd[1]: Started Heartbeat High Availability Cluster Communication and Memb
May 16 10:15:10 adumaster heartbeat[929]: May 16 10:15:10 adumaster heartbeat: [929]: ERROR: Cannot open
May 16 10:15:10 adumaster heartbeat[929]: May 16 10:15:10 adumaster heartbeat: [929]: info: An annotated
May 16 10:15:10 adumaster heartbeat[929]: May 16 10:15:10 adumaster heartbeat: [929]: info: Please copy
May 16 10:15:10 adumaster heartbeat[929]: May 16 10:15:10 adumaster heartbeat: [929]: ERROR: Heartbeat n
May 16 10:15:10 adumaster heartbeat[929]: May 16 10:15:10 adumaster heartbeat: [929]: ERROR: Configurati
May 16 10:15:10 adumaster systemd[1]: heartbeat.service: Main process exited, code=exited, status=6/NOTC
May 16 10:15:10 adumaster systemd[1]: heartbeat.service: Failed with result 'exit-code'.

check the error log message
/var/log/heartbeat/heartbeat

019-05-17T14:21:22+05:30 INFO No non-zero metrics in the last 30s
2019-05-17T14:21:30+05:30 ERR Failed to perform any bulk index operations: Post http://ip:port/_bulk: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-17T14:21:30+05:30 INFO Error publishing events (retrying): Post http://ip:port/_bulk: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
2019-05-17T14:21:31+05:30 ERR Connecting error publishing events (retrying): Get http://ip:port: EOF

(Martin R.) #2

It’s heartbeat-elastic not heartbeat for the service name. your systemctl command is for another software called heartbeat that has nothing to do with Elastic.

(Yogesh) #3

for hearneat 5.4 version we have to use systemctl status heartbeat.
for heartbeat 6.5 heartbeat-elastic

(Martin R.) #4

True, my bad. The package and service name was renamed in >=6.0.

So I guess you'll need to follow the doc here:
https://www.elastic.co/guide/en/beats/heartbeat/5.6/setup-repositories.html

But in addition since you already have the conflicting software called "Heartbeat High Availability Cluster Communication and Membership" which currently uses the service name "heartbeat" in your system, you'll have to either remove the conflicting software or rename one of the 2 services so they stop conflicting.

I would guess that your are not using the old software that is conflicting, so the easiest fix is probably to remove it and make sure your system has only one thing called "heartbeat". Remove the conflicting software then reinstall heartbeat, did you try that? If you are using it, then you need to rename one of the 2 services so they stop conflicting.

Does that help?
You're suffering from:


Which was fix in heartbeat >=6.0 by renaming the package and service to "heartbeat-elastic".

If the goal of your post was to discuss the error you have in your Elastic heartbeat logs, then the message seems to indicate heartbeat cannot connect to your Elasticsearch
server. It crashes after 2 days of runtime? But from your logs, heartbeat says it cannot connect to elasticsearch, so while it was running for 2 days, was it shipping events into elasticsearch?