Heartbeat on Amazon Linux just stops after 5mins

Hi,
I'm running some AWS EC2 instances running Amazon Linux. I have configured heartbeat on there to check 2 ports and 3 web instances. Heartbeat runs and works normally then in about 5 minutes or a little longer it just dies. There is nothing interesting in the log file. I did enable debug mode but that didn't show anything interesting.
Do you have any idea how to figure this one out?
I'm running heartbeat 6.2.2,
uname -a
Linux ip-10-213-103-145 4.9.77-31.58.amzn1.x86_64 #1 SMP Thu Jan 18 22:15:23 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

heartbeat.yml
name: eu-central-1-devl-A-ip-10-216-48-232
heartbeat.monitors:

  • type: http
    urls: ["http://localhost:9200","http://localhost:80","http://localhost:8080","https://blah:443"]
    schedule: '@every 30s'
  • type: tcp
    hosts: ["localhost"]
    ports: ["5044","4560"]
    schedule: '@every 30s'
    setup.template.settings:
    index.number_of_shards: 1
    index.codec: best_compression
    setup.kibana:
    output.elasticsearch:
    hosts: ["blah:9200"]
    protocol: "http"
    index: active-infra-heartbeat
    setup.template.enabled: false
    setup.template.name: heartbeat
    fields_under_root: true
    fields:
    stackletter: A
    stack: es-proxy-logstash-devl-stack-A
    environment: devl
    name: ip-
    region: eu-central-1

thanks,
Tim

Could you share your debug log file?

2018-03-19T16:36:57.773Z INFO instance/beat.go:468 Home path: [/usr/share/heartbeat] Config path: [/etc/heartbeat] Data path: [/var/lib/heartbeat] Logs path: [/var/log/heartbeat]
2018-03-19T16:36:57.773Z DEBUG [beat] instance/beat.go:495 Beat metadata path: /var/lib/heartbeat/meta.json
2018-03-19T16:36:57.773Z INFO instance/beat.go:475 Beat UUID: 082ab290-5052-4772-b811-d49543fdfcbe
2018-03-19T16:36:57.773Z INFO instance/beat.go:213 Setup Beat: heartbeat; Version: 6.2.2
2018-03-19T16:36:57.773Z DEBUG [beat] instance/beat.go:230 Initializing output plugins
2018-03-19T16:36:57.773Z DEBUG [processors] processors/processor.go:49 Processors:
2018-03-19T16:36:57.774Z INFO elasticsearch/client.go:145 Elasticsearch url: http://isgeis-logcentral.dx.deere.com:9200
2018-03-19T16:36:57.774Z INFO pipeline/module.go:76 Beat name: us-east-1-devl-A-ip-10-213-82-63
2018-03-19T16:36:57.774Z WARN beater/heartbeat.go:24 Beta: Heartbeat is beta software
2018-03-19T16:36:57.774Z INFO beater/manager.go:110 Select (active) monitor http
2018-03-19T16:36:57.774Z INFO beater/manager.go:110 Select (active) monitor tcp
2018-03-19T16:36:57.774Z DEBUG [processors] processors/processor.go:49 Processors:
2018-03-19T16:36:57.774Z DEBUG [scheduler] scheduler/scheduler.go:112 Add scheduler job 'http@http://localhost:9200'.
2018-03-19T16:36:57.774Z DEBUG [processors] processors/processor.go:49 Processors:
2018-03-19T16:36:57.774Z DEBUG [scheduler] scheduler/scheduler.go:112 Add scheduler job 'http@http://localhost:80'.
2018-03-19T16:36:57.774Z DEBUG [processors] processors/processor.go:49 Processors:
2018-03-19T16:36:57.774Z DEBUG [scheduler] scheduler/scheduler.go:112 Add scheduler job 'http@http://localhost:8080'.
2018-03-19T16:36:57.774Z DEBUG [processors] processors/processor.go:49 Processors:
2018-03-19T16:36:57.774Z DEBUG [scheduler] scheduler/scheduler.go:112 Add scheduler job 'http@https://search-isg-logcentral-devl-qmvnjqfjaeal6ua3zdqa2tbfla.us-east-1.es.amazonaws.com:443'.
2018-03-19T16:36:57.774Z DEBUG [tcp] tcp/tcp.go:100 Add tcp endpoint 'tcp://localhost'.
2018-03-19T16:36:57.774Z DEBUG [processors] processors/processor.go:49 Processors:
2018-03-19T16:36:57.774Z DEBUG [scheduler] scheduler/scheduler.go:112 Add scheduler job 'tcp-tcp@localhost:[5044 4560]'.
2018-03-19T16:36:57.774Z INFO instance/beat.go:301 heartbeat start running.
2018-03-19T16:36:57.775Z INFO beater/heartbeat.go:56 heartbeat is running! Hit CTRL-C to stop it.
2018-03-19T16:36:57.775Z DEBUG [scheduler] scheduler/scheduler.go:151 Start scheduler.
2018-03-19T16:36:57.775Z DEBUG [scheduler] scheduler/scheduler.go:170 Next wakeup time: 2018-03-19 16:37:27.775099241 +0000 UTC
2018-03-19T16:36:57.775Z INFO [monitoring] log/log.go:97 Starting metrics logging every 30s
2018-03-19T16:37:27.775Z DEBUG [scheduler] scheduler/scheduler.go:303 Start job 'http@http://localhost:9200' at 2018-03-19 16:37:27.77522116 +0000 UTC m=+30.010982955.
2018-03-19T16:37:27.775Z DEBUG [scheduler] scheduler/scheduler.go:303 Start job 'http@http://localhost:80' at 2018-03-19 16:37:27.775308913 +0000 UTC m=+30.011070685.
2018-03-19T16:37:27.775Z DEBUG [scheduler] scheduler/scheduler.go:303 Start job 'http@http://localhost:8080' at 2018-03-19 16:37:27.775323284 +0000 UTC m=+30.011085057.
2018-03-19T16:37:27.775Z DEBUG [scheduler] scheduler/scheduler.go:303 Start job 'http@https://search-isg-logcentral-devl-qmvnjqfjaeal6ua3zdqa2tbfla.us-east-1.es.amazonaws.com:443' at 2018-03-19 16:37:27.775334353 +0000 UTC m=+30.011096127.
2018-03-19T16:37:27.775Z DEBUG [scheduler] scheduler/scheduler.go:303 Start job 'tcp-tcp@localhost:[5044 4560]' at 2018-03-19 16:37:27.775353793 +0000 UTC m=+30.011115572.
2018-03-19T16:37:27.775Z DEBUG [scheduler] scheduler/scheduler.go:170 Next wakeup time: 2018-03-19 16:37:57.775099241 +0000 UTC
2018-03-19T16:37:27.775Z DEBUG [publish] pipeline/processor.go:275 Publish event: {
"@timestamp": "2018-03-19T16:37:27.775Z",
"@metadata": {
"beat": "heartbeat",
"type": "doc",
"version": "6.2.2"
},
"monitor": {
"id": "tcp-tcp@localhost:[5044 4560]",
"name": "tcp",
"duration": {
"us": 209
},
"status": "up",
"ip": "127.0.0.1",
"scheme": "tcp",
"type": "tcp",
"host": "localhost"
},
"type": "monitor",
"environment": "devl",
"name": "ip-10-213-82-63",
"region": "us-east-1",
"stackletter": "A",
"stack": "es-proxy-logstash-devl-stack-A",
"beat": {
"name": "us-east-1-devl-A-ip-10-213-82-63",
"hostname": "ip-10-213-82-63",
"version": "6.2.2"
},
"resolve": {
"rtt": {
"us": 182
},
"host": "localhost",
"ip": "127.0.0.1"
}
}
2018-03-19T16:37:27.775Z DEBUG [scheduler] scheduler/scheduler.go:208 Job 'tcp-tcp@localhost:[5044 4560]' returned at 2018-03-19 16:37:27.775775714 +0000 UTC m=+30.011537530 (cont=2).
2018-03-19T16:37:27.775Z DEBUG [scheduler] scheduler/scheduler.go:262 start returned tasks
2018-03-19T16:37:27.775Z DEBUG [scheduler] scheduler/scheduler.go:170 Next wakeup time: 2018-03-19 16:37:57.775099241 +0000 UTC
2018-03-19T16:37:27.776Z DEBUG [publish] pipeline/processor.go:275 Publish event: {
"@timestamp": "2018-03-19T16:37:27.775Z",
"@metadata": {
"beat": "heartbeat",
"type": "doc",
"version": "6.2.2"
},
"type": "monitor",
"name": "ip-10-213-82-63",
"region": "us-east-1",
"beat": {
"name": "us-east-1-devl-A-ip-10-213-82-63",
"hostname": "ip-10-213-82-63",
"version": "6.2.2"
},
"resolve": {
"ip": "127.0.0.1",
"rtt": {
"us": 182
},
"host": "localhost"
},
"tcp": {
"port": 5044,
"rtt": {
"connect": {
"us": 162
}
}
},
"monitor": {
"scheme": "tcp",
"id": "tcp-tcp@localhost:[5044 4560]",
"name": "tcp",
"type": "tcp",
"ip": "127.0.0.1",
"duration": {
"us": 1276
},
"host": "localhost",
"status": "up"
},
"stack": "es-proxy-logstash-devl-stack-A",
"environment": "devl",
"stackletter": "A"
}

I don't see a way to put the whole log on this post, so here is the last section before it stops working.

2018-03-19T16:37:27.847Z DEBUG [scheduler] scheduler/scheduler.go:208 Job 'http@http://localhost:80' returned at 2018-03-19 16:37:27.847769508 +0000 UTC m=+30.083531299 (cont=0).
2018-03-19T16:37:27.847Z DEBUG [scheduler] scheduler/scheduler.go:170 Next wakeup time: 2018-03-19 16:37:57.775099241 +0000 UTC
2018-03-19T16:37:28.776Z DEBUG [elasticsearch] elasticsearch/client.go:666 ES Ping(url=http://isgeis-logcentral.dx.deere.com:9200)
2018-03-19T16:37:28.885Z DEBUG [elasticsearch] elasticsearch/client.go:689 Ping status code: 200
2018-03-19T16:37:28.885Z INFO elasticsearch/client.go:690 Connected to Elasticsearch version 6.1.2
2018-03-19T16:37:28.948Z DEBUG [elasticsearch] elasticsearch/client.go:303 PublishEvents: 7 events have been published to elasticsearch in 63.637827ms.
2018-03-19T16:37:57.775Z DEBUG [scheduler] scheduler/scheduler.go:303 Start job 'http@http://localhost:9200' at 2018-03-19 16:37:57.775186457 +0000 UTC m=+60.010948221.
2018-03-19T16:37:57.775Z DEBUG [scheduler] scheduler/scheduler.go:303 Start job 'http@http://localhost:80' at 2018-03-19 16:37:57.775249513 +0000 UTC m=+60.011011278.
2018-03-19T16:37:57.775Z DEBUG [scheduler] scheduler/scheduler.go:303 Start job 'http@http://localhost:8080' at 2018-03-19 16:37:57.775261951 +0000 UTC m=+60.011023728.
2018-03-19T16:37:57.775Z DEBUG [scheduler] scheduler/scheduler.go:303 Start job 'http@https://search-isg-logcentral-devl-qmvnjqfjaeal6ua3zdqa2tbfla.us-east-1.es.amazonaws.com:443' at 2018-03-19 16:37:57.775273954 +0000 UTC m=+60.011035730.
2018-03-19T16:37:57.775Z DEBUG [scheduler] scheduler/scheduler.go:303 Start job 'tcp-tcp@localhost:[5044 4560]' at 2018-03-19 16:37:57.775287541 +0000 UTC m=+60.011049310.
2018-03-19T16:37:57.775Z DEBUG [scheduler] scheduler/scheduler.go:170 Next wakeup time: 2018-03-19 16:38:27.775099241 +0000 UTC
2018-03-19T16:37:57.775Z DEBUG [publish] pipeline/processor.go:275 Publish event: {
"@timestamp": "2018-03-19T16:37:57.775Z",
"@metadata": {
"beat": "heartbeat",
"type": "doc",
"version": "6.2.2"
},
"type": "monitor",
"resolve": {
"host": "localhost",
"ip": "127.0.0.1",
"rtt": {
"us": 28
}
},
"name": "ip-10-213-82-63",
"monitor": {
"ip": "127.0.0.1",
"scheme": "tcp",
"id": "tcp-tcp@localhost:[5044 4560]",
"host": "localhost",
"duration": {
"us": 46
},
"status": "up",
"name": "tcp",
"type": "tcp"
},
"stack": "es-proxy-logstash-devl-stack-A",
"environment": "devl",
"region": "us-east-1",
"stackletter": "A",
"beat": {
"name": "us-east-1-devl-A-ip-10-213-82-63",
"hostname": "ip-10-213-82-63",
"version": "6.2.2"
}
}
2018-03-19T16:37:57.775Z DEBUG [scheduler] scheduler/scheduler.go:208 Job 'tcp-tcp@localhost:[5044 4560]' returned at 2018-03-19 16:37:57.775672789 +0000 UTC m=+60.011434573 (cont=2).
2018-03-19T16:37:57.775Z DEBUG [scheduler] scheduler/scheduler.go:262 start returned tasks
2018-03-19T16:37:57.775Z DEBUG [scheduler] scheduler/scheduler.go:170 Next wakeup time: 2018-03-19 16:38:27.775099241 +0000 UTC

There is no shutdown or anything in the logs which is kind of strange. It would mean something is killing heartbeat. What else do you have installed on the image?

To post longer logs, best paste them into a gist and share them here.

That's what I was thinking. I will test with a fresh AMI from AWS Linux without our company stuff on there and see what happens.
--Tim

Hi,
I was able to test this the the latest Amazon Linux AMI, ami-1853ac65.
The behavior is the same, it runs for about 5 minutes and then shuts down. My build now is loading version 6.23 of heartbeat.
Any ideas?
Thanks,
Tim

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.