Hi Team,
There was activity to reboot the servers (beats
, logstash
servers etc.) for patching.
I noticed after this, filebeat
and metricbeat
services were failed (they are enabled to start on reboot). However only heartbeat
service was up.
server was rebooted around 10:12
.
[root@<hostname> ~]# who -b
system boot 2021-10-21 10:12
[root@<hostname> ~]# uptime
20:02:49 up 9:50, 2 users, load average: 0.16, 0.16, 0.10
[root@<hostname> ~]# date
Thu Oct 21 20:02:50 +03 2021
I. Error
connecting to kibana
,
filebeat
logs,
from 10:13
, started getting below error messages till around few seconds.
[root@<hostname> ~]# cat /var/log/messages |grep filebeat | grep 'http://<kibana_server1>:<kibana_port>/api/status fails'
Oct 21 10:13:22 <hostname> filebeat: 2021-10-21T10:13:22.261+0300#011ERROR#011instance/beat.go:989#011Exiting: error connecting to Kibana: fail to get the Kibana version: HTTP GET request to http://<kibana_server1>:<kibana_port>/api/status fails: fail to execute the HTTP GET request: Get "http://<kibana_server1>:<kibana_port>/api/status": dial tcp <kibana_server1>:<kibana_port>: connect: connection refused. Response: .
.
.
.
Oct 21 10:13:41 <hostname> filebeat: Exiting: error connecting to Kibana: fail to get the Kibana version: HTTP GET request to http://<kibana_server1>:<kibana_port>/api/status fails: fail to execute the HTTP GET request: Get "http://<kibana_server1>:<kibana_port>/api/status": dial tcp <kibana_server1>:<kibana_port>: connect: connection refused. Response: .
metricbeat
logs,
[root@<hostname> ~]# cat /var/log/messages |grep metricbeat | grep 'http://<kibana_server1>:<kibana_port>/api/status fails'
Oct 21 10:13:24 <hostname> metricbeat: 2021-10-21T10:13:24.397+0300#011ERROR#011instance/beat.go:989#011Exiting: error connecting to Kibana: fail to get the Kibana version: HTTP GET request to http://<kibana_server1>:<kibana_port>/api/status fails: fail to execute the HTTP GET request: Get "http://<kibana_server1>:<kibana_port>/api/status": dial tcp <kibana_server1>:<kibana_port>: connect: connection refused. Response: .
.
.
.
Oct 21 10:13:43 <hostname> metricbeat: Exiting: error connecting to Kibana: fail to get the Kibana version: HTTP GET request to http://<kibana_server1>:<kibana_port>/api/status fails: fail to execute the HTTP GET request: Get "http://<kibana_server1>:<kibana_port>/api/status": dial tcp <kibana_server1>:<kibana_port>: connect: connection refused. Response: .
heartbeat
has no such logs.
[root@<hostname> ~]# cat /var/log/messages |grep heartbeat | grep 'http://<kibana_server1>:<kibana_port>/api/status fails'
However all three beats have also failed to connect to logstash
.
II. Error connecting to logstash
,
filebeat
logs,
[root@<hostname> ~]# cat /var/log/messages |grep filebeat | grep error -i
Oct 18 10:12:29 <hostname> filebeat: 2021-10-18T10:12:29.897+0300#011ERROR#011[logstash]#011logstash/async.go:280#011Failed to publish events caused by: write tcp <App_server1>:50770-><logstash_server2>:<logstash_port>: write: connection reset by peer
Oct 18 10:12:29 <hostname> filebeat: 2021-10-18T10:12:29.955+0300#011ERROR#011[logstash]#011logstash/async.go:280#011Failed to publish events caused by: write tcp <App_server1>:43848-><logstash_server1>:<logstash_port>: write: connection reset by peer
metricbeat
logs,
Oct 20 10:02:38 <hostname> metricbeat: 2021-10-20T10:02:38.799+0300#011ERROR#011[logstash]#011logstash/async.go:280#011Failed to publish events caused by: write tcp <App_Server1>:58816-><logstash_server1>:<logstash_port>: write: connection reset by peer
Oct 20 10:17:21 <hostname> metricbeat: 2021-10-20T10:17:21.063+0300#011ERROR#011[publisher_pipeline_output]#011pipeline/output.go:154#011Failed to connect to backoff(async(tcp://<logstash_server2>:<logstash_port>)): dial tcp <logstash_server2>:<logstash_port>: connect: connection refused
heartbeat
logs,
Oct 20 10:02:42 <hostname> heartbeat: 2021-10-20T10:02:42.295+0300#011ERROR#011[logstash]#011logstash/async.go:280#011Failed to publish events caused by: write tcp <App_Server1>:54306-><logstash_server1>:<logstash_port>: write: connection reset by peer
Oct 20 10:16:44 <hostname> heartbeat: 2021-10-20T10:16:44.690+0300#011ERROR#011[publisher_pipeline_output]#011pipeline/output.go:180#011failed to publish events: write tcp <App_Server1>:32998-><logstash_server2>:<logstash_port>: write: connection reset by peer
III. Service
status output,
filebeat
failed
[root@<hostname> ~]# systemctl status filebeat.service
● filebeat.service - Filebeat sends log files to Logstash or directly to Elasticsearch.
Loaded: loaded (/usr/lib/systemd/system/filebeat.service; enabled; vendor preset: disabled)
Active: failed (Result: start-limit) since Thu 2021-10-21 10:13:41 +03; 10h ago
.
.
Oct 21 10:13:41 <hostname> systemd[1]: filebeat.service: main process exited, code=exited, status=1/FAILURE
Oct 21 10:13:41 <hostname> systemd[1]: Unit filebeat.service entered failed state.
Oct 21 10:13:41 <hostname> systemd[1]: filebeat.service failed.
Similarly, metricbeat
failed,
[root@<hostname> ~]# systemctl status metricbeat.service
● metricbeat.service - Metricbeat is a lightweight shipper for metrics.
Loaded: loaded (/usr/lib/systemd/system/metricbeat.service; enabled; vendor preset: disabled)
Active: failed (Result: start-limit) since Thu 2021-10-21 10:13:44 +03; 10h ago
.
.
Oct 21 10:13:43 <hostname> systemd[1]: Unit metricbeat.service entered failed state.
Oct 21 10:13:43 <hostname> systemd[1]: metricbeat.service failed.
As said heartbeat
service was up and running,
[root@<hostname> ~]# systemctl status heartbeat-elastic.service
● heartbeat-elastic.service - Ping remote services for availability and log results to Elasticsearch or send to Logstash.
Loaded: loaded (/usr/lib/systemd/system/heartbeat-elastic.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2021-10-21 10:13:01 +03; 10h ago
IV. beats
config file,
heartbeat
[root@<hostname> ~]# cat /etc/heartbeat/heartbeat.yml
name: app_server1
fields_under_root: true
fields:
host_id: app_server1
heartbeat.monitors:
- type: http
name: api-server1_app_server1
enabled: true
urls: ["http://<AppServer1_IP>:88/status"]
schedule: '@every 10s'
fields_under_root: true
fields:
app_id: api-server1-app_server1
- type: http
name: api-server2_app_server2
enabled: true
urls: ["http://<AppServer2_IP>:88/status"]
schedule: '@every 10s'
fields_under_root: true
fields:
app_id: api-server2-app_server2
setup.kibana:
host: "http://<kibana_server1>:<kibana_port>"
username: elastic
password: ${es_pwd}
output.logstash:
hosts: ['<logstash_server1>:<logstash_port>', '<logstash_server2>:<logstash_port>']
loadbalance: true
filebeat
[root@<hostname> ~]# cat /etc/filebeat/filebeat.yml
name: app_server1
filebeat.inputs:
- type: log
fields_under_root: true
fields:
log_type: api_app_server1
app_id: node
paths:
- /var/log/api/server.log
- /var/log/api/server-err.log
- type: log
fields_under_root: true
fields:
log_type: spa_app_server1
app_id: node
paths:
- /var/log/spa/server.log
- /var/log/spa/server-err.log
processors:
- add_host_metadata: ~
- add_cloud_metadata: ~
setup.dashboards.enabled: true
setup.kibana:
host: "http://<kibana_server1>:<kibana_port>"
username: elastic
password: ${es_pwd}
filebeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: true
output.logstash:
hosts: ['<logstash_server1>:<logstash_port>', '<logstash_server2>:<logstash_port>']
loadbalance: true
[root@<hostname> ~]#
metricbeat
[root@<hostname> ~]# cat /etc/metricbeat/metricbeat.yml
name: app_server1
fields_under_root: true
fields:
host_id: app_server1
metricbeat.config.modules:
path: ${path.config}/modules.d/*.yml
reload.enabled: true
setup.dashboards.enabled: true
setup.kibana:
host: "http://<kibana_server1>:<kibana_port>"
username: elastic
password: ${es_pwd}
output.logstash:
hosts: ['<logstash_server1>:<logstash_port>', '<logstash_server2>:<logstash_port>']
loadbalance: True
processors:
- add_host_metadata: ~
- add_cloud_metadata: ~
[root@<hostname> ~]#
I am checking is there a way to avoid this issue.
i) The Dashboard we see in kibana
--> Dashboards
--> and if we type filebeat
there (only filebeat
and metricbeat
are showing dashboards and there is no dashboard for heartbeat
, may be that I why it was not even trying to connect to kibana
and service was up), we see lots of filebeat
dashboards (e.g [Filebeat MySQL] Overview ECS
), Is these dashboard get loads due to above setup.dashboard
configuration in beats config file?
If yes, do we need to mentioned this in beats config file? It is required there permanently?
I think they can be set only one time from cmd line.
If its possible, then can we remove below config (if its only there to load the Dashboard
) so that this will at least avoid issue of beats
not able to connect to kibana
server when its not available..
setup.dashboards.enabled: true
setup.kibana:
host: "http://<kibana_server1>:<kibana_port>"
username: elastic
password: ${es_pwd}
Currently I am not thinking about increasing the timeout value etc.. as that can still cause the problem if kibana
server is unavailable beyond that value.
There are issues in connecting to logstash
also but that can be discuss later to discuss one issue at a time.
Thanks,