FileBeat client failed to connect

We are frequently getting below error in filebeat
Apr 19 04:14:56 c2t07851 /usr/bin/filebeat[31476]: transport.go:125: SSL client failed to connect with: dial tcp XX.XX.XX.XX:9700: i/o timeout

It goes away when we restart FILEbeat instance and logstash. But it reappears after some times.

We are using below versions
filebeat-1.2.1-1.x86_64
logstash-2.3.1-1.noarch

logstash configuration
beats {
port => 9700
ssl => true
ssl_certificate => "/etc/logstash/cert/server.crt"
ssl_key => "/etc/logstash/cert/server.key"
}

We tried below option to check accessibility and found both seems to be working
ping
telnet 9700 //we are using 9700 port

Can you also share your Filebeat config?

Here is our filebeat config

filebeat:
  # List of prospectors to fetch data.
  prospectors:
    -
      paths:
        - /opt/mount1/log/local/folder1/*.log
        - /opt/mount1/log/local/folder2/*.log
        - /opt/mount1/log/local/folder3/*.log
        - /opt/mount1/log/local/folder4/*.log
      encoding: plain
      input_type: log
      ignore_older: 24h
      document_type: log4j
      multiline:
        pattern: ^[0-9]{4}-[0-9]{2}-[0-9]{2}T([0-9]{2}:){2}[0-9]{2},[0-9]
        negate: true
        match: before
    -
      paths:
        - /opt/mount1/log/local/legacy/*.log

      encoding: plain

      input_type: log

      ignore_older: 24h

      document_type: legacy

      multiline:

        pattern: ^(.*)-*[[0-9]{4}-[0-9]{2}-[0-9]{2}T([0-9]{2}:){2}[0-9]{2},[0-9]
        negate: true

        match: before
  registry_file: /var/lib/filebeat/registry
output:
  logstash:
    hosts: ["host1:9700","host2:9700"]
    worker: 15
    loadbalance: true
    tls:
      certificate_authorities:
        - /usr/local/st/filebeat/cert55c/server.crt
        - /usr/local/st/filebeat/cert54c/server.crt
      insecure: true

Which version of the logstash-beats-plugin are you using?

@steffens Any idea here?

  1. Have you got some log messages from logstash? If logstash figures it's 'overloaded' it will not accept connections for N seconds. Logstash output in beats has default of 30s and it seem connection could not be made within this amount of seconds. With 30 workers in total, a subset of workers might be connected though? Have you checked with netstat how many connections are established?

  2. this filebeat config as is doesn't really make use of load-balancing. No need to have a total of 30 workers pushing to logstash. By default filebeat will push to one worker only and wait for ACK before pushing to another worker. To have load-balancing work properly in filebeat there are to options:

Option 1:
Enable publish_async: true in filebeat section. This option will create batches and publish fully asynchronous. CPU/memory usage will be much increased

Option 2:
Increase spooler_size to be a multiple to bulk_max_size in logstash output. By default both values are 2048. Having spooler_size = 30*bulk_max_size, the batch of lines created by spooler is devided into 30 mini-batches to be forwarded by logstash output with full load-balancing. We still have to wait for all logstash instances to ACK the publish request before sending the next batch. Throughput might be a little less in comparison to publish_async: true, but so might be memory usage.