Metricbeat, filebeat service failed to start after logstash, kibana servers was rebooted on v7.14

Hi Team,

There was activity to reboot the servers (beats, logstash servers etc.) for patching.

I noticed after this, filebeat and metricbeat services were failed (they are enabled to start on reboot). However only heartbeat service was up.

server was rebooted around 10:12.

[root@<hostname> ~]# who -b
 system boot  2021-10-21 10:12

[root@<hostname> ~]# uptime
20:02:49 up  9:50,  2 users,  load average: 0.16, 0.16, 0.10

[root@<hostname> ~]# date
Thu Oct 21 20:02:50 +03 2021

I. Error connecting to kibana,

filebeat logs,
from 10:13, started getting below error messages till around few seconds.

[root@<hostname> ~]# cat /var/log/messages |grep filebeat | grep 'http://<kibana_server1>:<kibana_port>/api/status fails'
Oct 21 10:13:22 <hostname> filebeat: 2021-10-21T10:13:22.261+0300#011ERROR#011instance/beat.go:989#011Exiting: error connecting to Kibana: fail to get the Kibana version: HTTP GET request to http://<kibana_server1>:<kibana_port>/api/status fails: fail to execute the HTTP GET request: Get "http://<kibana_server1>:<kibana_port>/api/status": dial tcp <kibana_server1>:<kibana_port>: connect: connection refused. Response: .
.
. 
. 
Oct 21 10:13:41 <hostname> filebeat: Exiting: error connecting to Kibana: fail to get the Kibana version: HTTP GET request to http://<kibana_server1>:<kibana_port>/api/status fails: fail to execute the HTTP GET request: Get "http://<kibana_server1>:<kibana_port>/api/status": dial tcp <kibana_server1>:<kibana_port>: connect: connection refused. Response: .

metricbeat logs,

[root@<hostname> ~]# cat /var/log/messages |grep metricbeat | grep 'http://<kibana_server1>:<kibana_port>/api/status fails'
Oct 21 10:13:24 <hostname> metricbeat: 2021-10-21T10:13:24.397+0300#011ERROR#011instance/beat.go:989#011Exiting: error connecting to Kibana: fail to get the Kibana version: HTTP GET request to http://<kibana_server1>:<kibana_port>/api/status fails: fail to execute the HTTP GET request: Get "http://<kibana_server1>:<kibana_port>/api/status": dial tcp <kibana_server1>:<kibana_port>: connect: connection refused. Response: .
.
.
.
Oct 21 10:13:43 <hostname> metricbeat: Exiting: error connecting to Kibana: fail to get the Kibana version: HTTP GET request to http://<kibana_server1>:<kibana_port>/api/status fails: fail to execute the HTTP GET request: Get "http://<kibana_server1>:<kibana_port>/api/status": dial tcp <kibana_server1>:<kibana_port>: connect: connection refused. Response: .

heartbeat has no such logs.

[root@<hostname> ~]# cat /var/log/messages |grep heartbeat | grep 'http://<kibana_server1>:<kibana_port>/api/status fails'

However all three beats have also failed to connect to logstash.

II. Error connecting to logstash,

filebeat logs,

[root@<hostname> ~]# cat /var/log/messages |grep filebeat | grep  error -i

Oct 18 10:12:29 <hostname> filebeat: 2021-10-18T10:12:29.897+0300#011ERROR#011[logstash]#011logstash/async.go:280#011Failed to publish events caused by: write tcp <App_server1>:50770-><logstash_server2>:<logstash_port>: write: connection reset by peer

Oct 18 10:12:29 <hostname> filebeat: 2021-10-18T10:12:29.955+0300#011ERROR#011[logstash]#011logstash/async.go:280#011Failed to publish events caused by: write tcp <App_server1>:43848-><logstash_server1>:<logstash_port>: write: connection reset by peer

metricbeat logs,

Oct 20 10:02:38 <hostname> metricbeat: 2021-10-20T10:02:38.799+0300#011ERROR#011[logstash]#011logstash/async.go:280#011Failed to publish events caused by: write tcp <App_Server1>:58816-><logstash_server1>:<logstash_port>: write: connection reset by peer

Oct 20 10:17:21 <hostname> metricbeat: 2021-10-20T10:17:21.063+0300#011ERROR#011[publisher_pipeline_output]#011pipeline/output.go:154#011Failed to connect to backoff(async(tcp://<logstash_server2>:<logstash_port>)): dial tcp <logstash_server2>:<logstash_port>: connect: connection refused

heartbeat logs,

Oct 20 10:02:42 <hostname> heartbeat: 2021-10-20T10:02:42.295+0300#011ERROR#011[logstash]#011logstash/async.go:280#011Failed to publish events caused by: write tcp <App_Server1>:54306-><logstash_server1>:<logstash_port>: write: connection reset by peer

Oct 20 10:16:44 <hostname> heartbeat: 2021-10-20T10:16:44.690+0300#011ERROR#011[publisher_pipeline_output]#011pipeline/output.go:180#011failed to publish events: write tcp <App_Server1>:32998-><logstash_server2>:<logstash_port>: write: connection reset by peer

III. Service status output,

filebeat failed

[root@<hostname> ~]# systemctl status filebeat.service
● filebeat.service - Filebeat sends log files to Logstash or directly to Elasticsearch.
   Loaded: loaded (/usr/lib/systemd/system/filebeat.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Thu 2021-10-21 10:13:41 +03; 10h ago
.
.
Oct 21 10:13:41 <hostname> systemd[1]: filebeat.service: main process exited, code=exited, status=1/FAILURE
Oct 21 10:13:41 <hostname> systemd[1]: Unit filebeat.service entered failed state.
Oct 21 10:13:41 <hostname> systemd[1]: filebeat.service failed.

Similarly, metricbeat failed,

[root@<hostname> ~]# systemctl status metricbeat.service
● metricbeat.service - Metricbeat is a lightweight shipper for metrics.
   Loaded: loaded (/usr/lib/systemd/system/metricbeat.service; enabled; vendor preset: disabled)
   Active: failed (Result: start-limit) since Thu 2021-10-21 10:13:44 +03; 10h ago
.
.
Oct 21 10:13:43 <hostname> systemd[1]: Unit metricbeat.service entered failed state.
Oct 21 10:13:43 <hostname> systemd[1]: metricbeat.service failed.

As said heartbeat service was up and running,

[root@<hostname> ~]# systemctl status heartbeat-elastic.service
● heartbeat-elastic.service - Ping remote services for availability and log results to Elasticsearch or send to Logstash.
   Loaded: loaded (/usr/lib/systemd/system/heartbeat-elastic.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2021-10-21 10:13:01 +03; 10h ago

IV. beats config file,

heartbeat

[root@<hostname> ~]# cat /etc/heartbeat/heartbeat.yml

name: app_server1
fields_under_root: true
fields:
    host_id: app_server1

heartbeat.monitors:
  - type: http
    name: api-server1_app_server1
    enabled: true
    urls: ["http://<AppServer1_IP>:88/status"]
    schedule: '@every 10s'
    fields_under_root: true
    fields:
      app_id: api-server1-app_server1

  - type: http
    name: api-server2_app_server2
    enabled: true
    urls: ["http://<AppServer2_IP>:88/status"]
    schedule: '@every 10s'
    fields_under_root: true
    fields:
      app_id: api-server2-app_server2

setup.kibana:
  host: "http://<kibana_server1>:<kibana_port>"
  username: elastic
  password: ${es_pwd}

output.logstash:
  hosts: ['<logstash_server1>:<logstash_port>', '<logstash_server2>:<logstash_port>']
  loadbalance: true

filebeat

[root@<hostname> ~]# cat /etc/filebeat/filebeat.yml
 
name: app_server1

filebeat.inputs:
    - type: log
      fields_under_root: true
      fields:
         log_type:  api_app_server1
         app_id: node
      paths:
        - /var/log/api/server.log
        - /var/log/api/server-err.log

    - type: log
      fields_under_root: true
      fields:
         log_type:  spa_app_server1
         app_id: node
      paths:
        - /var/log/spa/server.log
        - /var/log/spa/server-err.log

processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~

setup.dashboards.enabled: true
setup.kibana:
  host: "http://<kibana_server1>:<kibana_port>"
  username: elastic
  password: ${es_pwd}

filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: true

output.logstash:
  hosts: ['<logstash_server1>:<logstash_port>', '<logstash_server2>:<logstash_port>']
  loadbalance: true
[root@<hostname> ~]#

metricbeat

[root@<hostname> ~]# cat /etc/metricbeat/metricbeat.yml

name: app_server1
fields_under_root: true
fields:
  host_id: app_server1

metricbeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: true

setup.dashboards.enabled: true
setup.kibana:
  host: "http://<kibana_server1>:<kibana_port>"
  username: elastic
 password: ${es_pwd}

output.logstash:
  hosts: ['<logstash_server1>:<logstash_port>', '<logstash_server2>:<logstash_port>'] 
  loadbalance: True

processors:
  - add_host_metadata: ~
  - add_cloud_metadata: ~
[root@<hostname> ~]#

I am checking is there a way to avoid this issue.

i) The Dashboard we see in kibana --> Dashboards --> and if we type filebeat there (only filebeat and metricbeat are showing dashboards and there is no dashboard for heartbeat, may be that I why it was not even trying to connect to kibana and service was up), we see lots of filebeat dashboards (e.g [Filebeat MySQL] Overview ECS), Is these dashboard get loads due to above setup.dashboard configuration in beats config file?

If yes, do we need to mentioned this in beats config file? It is required there permanently?
I think they can be set only one time from cmd line.

If its possible, then can we remove below config (if its only there to load the Dashboard) so that this will at least avoid issue of beats not able to connect to kibana server when its not available..

setup.dashboards.enabled: true
setup.kibana:
  host: "http://<kibana_server1>:<kibana_port>"
  username: elastic
 password: ${es_pwd}

Currently I am not thinking about increasing the timeout value etc.. as that can still cause the problem if kibana server is unavailable beyond that value.

There are issues in connecting to logstash also but that can be discuss later to discuss one issue at a time.

Thanks,

Hi Team,

Could someone please reply.

Thanks,

Is your kibana running ok? The connection refused means that the server rejected the connection, which could mean that the service is not running or is not listening on the specified port.

The Kibana configuration on the beats are only to setup the dashboards, if you already set it up or do not need it anymore, you can remove the configuration.

Your logstash errors are similar to the kibana ones, they are connection errors.

You need to check the connection between your server on the Kibana and Logstash ports that you are using.

Hi @leandrojmp,

Thanks for reply.

The connection was there before the activity. As I said there was reboot activity happened so I think this is why this issue started occuring when logstash, kibana (both are on same servers) where unavailable for beats servers to connect.

I have started beats services and they are running but want to avoid this if reboot activity happens again.

So to setup this from cmd line while deploying elasticsearch cluster, Is running below command from one of the beat server (out of many) sufficient?

filebeat setup -e \
  -E output.logstash.enabled=false \
  -E output.elasticsearch.hosts=['localhost:9200'] \
  -E output.elasticsearch.username=filebeat_internal \
  -E output.elasticsearch.password=YOUR_PASSWORD \
  -E setup.kibana.host=localhost:5601

and then not adding below in beats config file,

setup.dashboards.enabled: true
setup.kibana:
  host: "http://<kibana_server1>:<kibana_port>"
  username: elastic
 password: ${es_pwd}

is this correct?

Will this setup from cmdline stays there for ever? even after service restart, server reboot etc.?

Thanks,

You do not need to run the setup command more than once, if you ran it already you do not need to run it again.

You also do not need to have the settings to set it up the dashboards in Kibana if you already did that, it is a one time configuration and you can remove it.

But those settings have nothing to do with your error, your error was a connection issue, if Logstash, Elasticsearch and Kibana were not running, having those settings or not would make no difference.

You need to make sure that your services will start up on a server reboot, this depends on how you are running those services, if for example you are using systemd, you just need to enable them to start up on boot.

Hi @leandrojmp,

Ok I got your point. I was asking from fresh/first time elasticsearch cluster deployment point of you.

I am able to see the list of dashboards after typing filebeat and metricbeat in Dashboards in kibana so I can assume they are setup due to setup.dashboard configuration in beats config file which can be remove now.

So I am planning to set this up from cmdline while deploying es cluster and not adding above setup.dashboard configuration in beats config file.


Ok, but in this case then, do you think why beats where trying to connect to kibana, logstash, In what other case this might have happend ?

I found below for heartbeat ( and not able to find the same for filebeat and metricbeat and strangely I got this issue for filebeat and metricbeat and not for heartbeat as said above but below must be same for all beats )

When dashboard loading is enabled, Heartbeat uses the Kibana API to load the sample dashboards. Dashboard loading is only attempted when Heartbeat starts up. If Kibana is not available at startup, Heartbeat will stop with an error.

So I think, after beats server came up and while starting the service, it was trying to connect to kibana, logstash but they were not available and after some time of attempt, service got failed. Other than this I do not know any reason why it might be try to connect to them.


Yes they are enabled as said here.

Thanks,

Your beats configuration have the logstash output, so they will always try to connect to logstash to send data, if logstash is not running, you will get an error.

As I said, the Kibana settings are only needed to load the dashboards, if they were already loaded you can remove that setting, and if Kibana is not available you will aso get an error.

Also, if you are going to use the setup command to load the dashboards you do not need to use the settings in the yml files. This is explained in the documentation.

(...) To do this, you can either run the setup command (as described here) or configure dashboard loading in the filebeat.yml config file. This requires a Kibana endpoint configuration. If you didn’t already configure a Kibana endpoint, see Kibana endpoint .

Yes correct. Thanks.

Logstash output is always required.

To prevent this,

Either increasing some timeout. Not yet searched for this enough.
or, first reboot logstash server and then beats server.

What can be the ways to prevent this logstash issue.

I do not see what you are trying to prevent here and what is the issue now.

This is normal, if your logstash server is not running, everything that is configured to send data to this server will give you an error, this is by design.

Also, the beats have an internal queue where the events are stored before being sent to the output, so in the case of a Logstash reboot, your events will be stored in this queue while the server is unavailabel, it will only be an issue if your logstash takes too long to reboot or it is not working.

If you do not want to get any errors when you restart your logstash service or reboot the server, then you will need to stop everything that is sending data to logstash before stopping logstash.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.