Hi Team,
There was activity to reboot the servers (beats, logstash servers etc.) for patching.
I noticed after this, filebeat and metricbeat services were failed (they are enabled to start on reboot). However only heartbeat service was up.
server was rebooted around 10:12.
[root@<hostname> ~]# who -b
 system boot  2021-10-21 10:12
[root@<hostname> ~]# uptime
20:02:49 up  9:50,  2 users,  load average: 0.16, 0.16, 0.10
[root@<hostname> ~]# date
Thu Oct 21 20:02:50 +03 2021
I. Error connecting to kibana,
filebeat logs,
from 10:13, started getting below error messages till around few seconds.
[root@<hostname> ~]# cat /var/log/messages |grep filebeat | grep 'http://<kibana_server1>:<kibana_port>/api/status fails'
Oct 21 10:13:22 <hostname> filebeat: 2021-10-21T10:13:22.261+0300#011ERROR#011instance/beat.go:989#011Exiting: error connecting to Kibana: fail to get the Kibana version: HTTP GET request to http://<kibana_server1>:<kibana_port>/api/status fails: fail to execute the HTTP GET request: Get "http://<kibana_server1>:<kibana_port>/api/status": dial tcp <kibana_server1>:<kibana_port>: connect: connection refused. Response: .
.
. 
. 
Oct 21 10:13:41 <hostname> filebeat: Exiting: error connecting to Kibana: fail to get the Kibana version: HTTP GET request to http://<kibana_server1>:<kibana_port>/api/status fails: fail to execute the HTTP GET request: Get "http://<kibana_server1>:<kibana_port>/api/status": dial tcp <kibana_server1>:<kibana_port>: connect: connection refused. Response: .
metricbeat logs,
[root@<hostname> ~]# cat /var/log/messages |grep metricbeat | grep 'http://<kibana_server1>:<kibana_port>/api/status fails'
Oct 21 10:13:24 <hostname> metricbeat: 2021-10-21T10:13:24.397+0300#011ERROR#011instance/beat.go:989#011Exiting: error connecting to Kibana: fail to get the Kibana version: HTTP GET request to http://<kibana_server1>:<kibana_port>/api/status fails: fail to execute the HTTP GET request: Get "http://<kibana_server1>:<kibana_port>/api/status": dial tcp <kibana_server1>:<kibana_port>: connect: connection refused. Response: .
.
.
.
Oct 21 10:13:43 <hostname> metricbeat: Exiting: error connecting to Kibana: fail to get the Kibana version: HTTP GET request to http://<kibana_server1>:<kibana_port>/api/status fails: fail to execute the HTTP GET request: Get "http://<kibana_server1>:<kibana_port>/api/status": dial tcp <kibana_server1>:<kibana_port>: connect: connection refused. Response: .
heartbeat has no such logs.
[root@<hostname> ~]# cat /var/log/messages |grep heartbeat | grep 'http://<kibana_server1>:<kibana_port>/api/status fails'
However all three beats have also failed to connect to logstash.
II. Error connecting to logstash,
filebeat logs,
[root@<hostname> ~]# cat /var/log/messages |grep filebeat | grep  error -i
Oct 18 10:12:29 <hostname> filebeat: 2021-10-18T10:12:29.897+0300#011ERROR#011[logstash]#011logstash/async.go:280#011Failed to publish events caused by: write tcp <App_server1>:50770-><logstash_server2>:<logstash_port>: write: connection reset by peer
Oct 18 10:12:29 <hostname> filebeat: 2021-10-18T10:12:29.955+0300#011ERROR#011[logstash]#011logstash/async.go:280#011Failed to publish events caused by: write tcp <App_server1>:43848-><logstash_server1>:<logstash_port>: write: connection reset by peer
metricbeat logs,
Oct 20 10:02:38 <hostname> metricbeat: 2021-10-20T10:02:38.799+0300#011ERROR#011[logstash]#011logstash/async.go:280#011Failed to publish events caused by: write tcp <App_Server1>:58816-><logstash_server1>:<logstash_port>: write: connection reset by peer
Oct 20 10:17:21 <hostname> metricbeat: 2021-10-20T10:17:21.063+0300#011ERROR#011[publisher_pipeline_output]#011pipeline/output.go:154#011Failed to connect to backoff(async(tcp://<logstash_server2>:<logstash_port>)): dial tcp <logstash_server2>:<logstash_port>: connect: connection refused
heartbeat logs,
Oct 20 10:02:42 <hostname> heartbeat: 2021-10-20T10:02:42.295+0300#011ERROR#011[logstash]#011logstash/async.go:280#011Failed to publish events caused by: write tcp <App_Server1>:54306-><logstash_server1>:<logstash_port>: write: connection reset by peer
Oct 20 10:16:44 <hostname> heartbeat: 2021-10-20T10:16:44.690+0300#011ERROR#011[publisher_pipeline_output]#011pipeline/output.go:180#011failed to publish events: write tcp <App_Server1>:32998-><logstash_server2>:<logstash_port>: write: connection reset by peer
III. Service status output,
filebeat failed
[root@<hostname> ~]# systemctl status filebeat.service
● filebeat.service - Filebeat sends log files to Logstash or directly to Elasticsearch.
   Loaded: loaded (/usr/lib/systemd/system/filebeat.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Thu 2021-10-21 10:13:41 +03; 10h ago
.
.
Oct 21 10:13:41 <hostname> systemd[1]: filebeat.service: main process exited, code=exited, status=1/FAILURE
Oct 21 10:13:41 <hostname> systemd[1]: Unit filebeat.service entered failed state.
Oct 21 10:13:41 <hostname> systemd[1]: filebeat.service failed.
Similarly, metricbeat failed,
[root@<hostname> ~]# systemctl status metricbeat.service
● metricbeat.service - Metricbeat is a lightweight shipper for metrics.
   Loaded: loaded (/usr/lib/systemd/system/metricbeat.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Thu 2021-10-21 10:13:44 +03; 10h ago
.
.
Oct 21 10:13:43 <hostname> systemd[1]: Unit metricbeat.service entered failed state.
Oct 21 10:13:43 <hostname> systemd[1]: metricbeat.service failed.
As said heartbeat service was up and running,
[root@<hostname> ~]# systemctl status heartbeat-elastic.service
● heartbeat-elastic.service - Ping remote services for availability and log results to Elasticsearch or send to Logstash.
   Loaded: loaded (/usr/lib/systemd/system/heartbeat-elastic.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2021-10-21 10:13:01 +03; 10h ago
IV. beats config file,
heartbeat
[root@<hostname> ~]# cat /etc/heartbeat/heartbeat.yml
name: app_server1
fields_under_root: true
fields:
    host_id: app_server1
heartbeat.monitors:
  - type: http
    name: api-server1_app_server1
    enabled: true
    urls: ["http://<AppServer1_IP>:88/status"]
    schedule: '@every 10s'
    fields_under_root: true
    fields:
      app_id: api-server1-app_server1
  - type: http
    name: api-server2_app_server2
    enabled: true
    urls: ["http://<AppServer2_IP>:88/status"]
    schedule: '@every 10s'
    fields_under_root: true
    fields:
      app_id: api-server2-app_server2
setup.kibana:
  host: "http://<kibana_server1>:<kibana_port>"
  username: elastic
  password: ${es_pwd}
output.logstash:
  hosts: ['<logstash_server1>:<logstash_port>', '<logstash_server2>:<logstash_port>']
  loadbalance: true
filebeat
[root@<hostname> ~]# cat /etc/filebeat/filebeat.yml
 
name: app_server1
filebeat.inputs:
    - type: log
      fields_under_root: true
      fields:
         log_type:  api_app_server1
         app_id: node
      paths:
        - /var/log/api/server.log
        - /var/log/api/server-err.log
    - type: log
      fields_under_root: true
      fields:
         log_type:  spa_app_server1
         app_id: node
      paths:
        - /var/log/spa/server.log
        - /var/log/spa/server-err.log
processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~
setup.dashboards.enabled: true
setup.kibana:
  host: "http://<kibana_server1>:<kibana_port>"
  username: elastic
  password: ${es_pwd}
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: true
output.logstash:
  hosts: ['<logstash_server1>:<logstash_port>', '<logstash_server2>:<logstash_port>']
  loadbalance: true
[root@<hostname> ~]#
metricbeat
[root@<hostname> ~]# cat /etc/metricbeat/metricbeat.yml
name: app_server1
fields_under_root: true
fields:
  host_id: app_server1
metricbeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: true
setup.dashboards.enabled: true
setup.kibana:
  host: "http://<kibana_server1>:<kibana_port>"
  username: elastic
 password: ${es_pwd}
output.logstash:
  hosts: ['<logstash_server1>:<logstash_port>', '<logstash_server2>:<logstash_port>'] 
  loadbalance: True
processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~
[root@<hostname> ~]#
I am checking is there a way to avoid this issue.
i) The Dashboard we see in kibana --> Dashboards --> and if we type filebeat there (only filebeat and metricbeat are showing dashboards and there is no dashboard for heartbeat, may be that I why it was not even trying to connect to kibana and service was up), we see lots of filebeat dashboards (e.g [Filebeat MySQL] Overview ECS), Is these dashboard get loads due to above  setup.dashboard configuration in beats config file?
If yes, do we need to mentioned this in beats config file? It is required there permanently?
I think they can be set only one time from cmd line.
If its possible, then can we remove below config (if its only there to load the Dashboard) so that this will at least avoid issue of  beats not able to connect to kibana server when its not available..
setup.dashboards.enabled: true
setup.kibana:
  host: "http://<kibana_server1>:<kibana_port>"
  username: elastic
 password: ${es_pwd}
Currently I am not thinking about increasing the timeout value etc.. as that can still cause the problem if kibana server is unavailable beyond that value.
There are issues in connecting to logstash also but that can be discuss later to discuss one issue at a time.
Thanks,