Heartbeat i/o timeout

ECE 2.3
RHEL 7.6 (Maipo)
Redis server v=5.0.3

I'm having trouble with an application of ours connecting to Heartbeat. From what I've researched Redis should be listening on ports 6379, 6380, 16379 and 16380, which Redis seems to be doing after running lsof -i -P -n | grep LISTEN. However when Heartbeat is setup to check our Redis servers its reporting an i/o timeout. Seems like I'm missing something here since the other portions of our Heartbeat yaml seem to be working just fine.

2019-08-29T09:27:54.527-0400	DEBUG	[scheduler]	scheduler/scheduler.go:248	Job 'auto-http-' returned at 2019-08-29 09:27:54.5277476 -0400 EDT m=+76232.691697201 (cont=0).
2019-08-29T09:27:54.527-0400	**DEBUG	[tcp]	tcp/task.go:43	dial failed with: dial tcp xx.xx.xx.136:6379: i/o timeout**
2019-08-29T09:27:54.527-0400	DEBUG	[scheduler]	scheduler/scheduler.go:206	Next wakeup time: 2019-08-29 09:28:08.5111252 -0400 EDT
2019-08-29T09:27:54.527-0400	DEBUG	[processors]	processing/processors.go:183	Publish event: {
  "@timestamp": "2019-08-29T13:27:38.524Z",
  "@metadata": {
    "beat": "",
    "type": "_doc",
    "version": ""
  },
  "tags": [
    "DEV"
  ],
  "agent": {
    "version": "7.0.0",
    "type": "heartbeat",
    "ephemeral_id": "<id>",
    "hostname": "<server_name>",
    "id": "<id>"
  },
  "monitor": {
    "ip": "xx.xx.xx.137",
    "status": "down",
    "duration": {
      "us": 16001900
    },
    "id": "<id>",
    "name": "<name>",
    "type": "tcp",
    "check_group": "<group>"
  },
  "summary": {
    "down": 1,
    "up": 0
  },
  "source": "Heartbeat",
  "host": {
    "id": "<id>",
    "hostname": "<server_name>",
    "architecture": "x86_64",
    "os": {
      "build": "14393.3085",
      "platform": "windows",
      "version": "10.0",
      "family": "windows",
      "name": "Windows Server 2016 Standard",
      "kernel": "10.0.14393.3085 (rs1_release.190703-1816)"
    },
    "name": "<server_name>"
  },
  "error": {
    "type": "io",
    **"message": "dial tcp xx.xx.xx.137:6379: i/o timeout"**
  },
  "event": {
    "dataset": "uptime"
  },
  "env": "DEV",
  "application": "<name>",
  "layer": "APP2",
  "url": {
    "domain": "xx.xx.xx.137",
    "port": 6379,
    "full": "tcp://xx.xx.xx.137:6379",
    "scheme": "tcp"
  },
  "project": "<name>",
  "ecs": {
    "version": "1.0.0"
  }
}

Hearbeat yaml portion:

- type: tcp

  # List or urls to query
  hosts: ["xx.xx.xx.136:6379","xx.xx.xx.137:6379","xx.xx.xx.149:6379"]
  name: <name>
  tags: ["DEV"]
  fields: 
    env: DEV
    project: <name>
    application: <name>
    layer: APP2
    source: Heartbeat
  fields_under_root: True
  
  # Configure task schedule
  schedule: '@every 15s'

Hi Ryan,

An I/O timeout is usually some network problem, some things to check:

  • Is it possible that redis is not listening on these IPs? (if it is only listening in local interfaces for example)
  • Is there any firewall that could be affecting the communications between heartbeat and redis hosts?
  • Could you check if you can connect from the host where hearbeat is running to one of the redis ports?

Going through your troubleshooting steps now, should be able to respond with more info tomorrow. I've already realized that I somehow used the cmd prompt from my desktop computer instead of the tab in MoabXTerm to telnet to the Redis server. Of course it works from my desktop but not from the server that I should have been testing on so theres problem #1 that I need to addressed.

1 Like

While not 100% confirmed yet I believe its a Unisys Stealth issue between our servers thats causing the lack of comms. Appreciate the help and quick response Jaime. Enjoy the rest of your day.
-Ryan

Update
Stealth settings needed to be updated which fixed the problem.

@Ryan_Downey glad it got fixed, and thanks for leaving the update here!

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.