Redis input plugin back pressure issue

Hi, we are using ELK 6.3 with the data flow Filebeat -> Redis -> Logstash -> Elasticsearch. During the last few days we are doing several round of failure tests to study the impact of components down. We noticed when Redis tier was down, filebeat would keep retry connect frequently and infinitely with messages sample "2018-10-03T14:16:53.459+0800 ERROR pipeline/output.go:74 Failed to connect: dial tcp xx.xxx.xxx.xxx:6379: getsockopt: connection refused". We tried many settings like backoff, timeout but the problem still persisted.

On the other hand, we tried use Filebeat -> Logstash -> Elasticsearch, the back pressure setting in filebeat work fine. Any idea how can we solve the frequent re-connect issue between Filebeat and Redis?

Hi @Leo_Kwok :slight_smile:

The following is a list of configuration parameteres that you can set on the Redis output.

#------------------------------- Redis output ----------------------------------
#output.redis:
  # Boolean flag to enable or disable the output module.
  #enabled: true

  # Configure JSON encoding
  #codec.json:
    # Pretty print json event
    #pretty: false

    # Configure escaping html symbols in strings.
    #escape_html: true

  # The list of Redis servers to connect to. If load balancing is enabled, the
  # events are distributed to the servers in the list. If one server becomes
  # unreachable, the events are distributed to the reachable servers only.
  #hosts: ["localhost:6379"]

  # The Redis port to use if hosts does not contain a port number. The default
  # is 6379.
  #port: 6379

  # The name of the Redis list or channel the events are published to. The
  # default is filebeat.
  #key: filebeat

  # The password to authenticate with. The default is no authentication.
  #password:

  # The Redis database number where the events are published. The default is 0.
  #db: 0

  # The Redis data type to use for publishing events. If the data type is list,
  # the Redis RPUSH command is used. If the data type is channel, the Redis
  # PUBLISH command is used. The default value is list.
  #datatype: list

  # The number of workers to use for each host configured to publish events to
  # Redis. Use this setting along with the loadbalance option. For example, if
  # you have 2 hosts and 3 workers, in total 6 workers are started (3 for each
  # host).
  #worker: 1

  # If set to true and multiple hosts or workers are configured, the output
  # plugin load balances published events onto all Redis hosts. If set to false,
  # the output plugin sends all events to only one host (determined at random)
  # and will switch to another host if the currently selected one becomes
  # unreachable. The default value is true.
  #loadbalance: true

  # The Redis connection timeout in seconds. The default is 5 seconds.
  timeout: 5s

  # The number of times to retry publishing an event after a publishing failure.
  # After the specified number of retries, the events are typically dropped.
  # Some Beats, such as Filebeat, ignore the max_retries setting and retry until
  # all events are published. Set max_retries to a value less than 0 to retry
  # until all events are published. The default is 3.
  #max_retries: 3

  # The number of seconds to wait before trying to reconnect to Redis
  # after a network error. After waiting backoff.init seconds, the Beat
  # tries to reconnect. If the attempt fails, the backoff timer is increased
  # exponentially up to backoff.max. After a successful connection, the backoff
  # timer is reset. The default is 1s.
  backoff.init: 1s

  # The maximum number of seconds to wait before attempting to connect to
  # Redis after a network error. The default is 60s.
  backoff.max: 60s

  # The maximum number of events to bulk in a single Redis request or pipeline.
  # The default is 2048.
  #bulk_max_size: 2048

  # The URL of the SOCKS5 proxy to use when connecting to the Redis servers. The
  # value must be a URL with a scheme of socks5://.
  #proxy_url:

  # This option determines whether Redis hostnames are resolved locally when
  # using a proxy. The default value is false, which means that name resolution
  # occurs on the proxy server.
  #proxy_use_local_resolver: false

  # Enable SSL support. SSL is automatically enabled, if any SSL setting is set.
  #ssl.enabled: true

  # Configure SSL verification mode. If `none` is configured, all server hosts
  # and certificates will be accepted. In this mode, SSL based connections are
  # susceptible to man-in-the-middle attacks. Use only for testing. Default is
  # `full`.
  #ssl.verification_mode: full

  # List of supported/valid TLS versions. By default all TLS versions 1.0 up to
  # 1.2 are enabled.
  #ssl.supported_protocols: [TLSv1.0, TLSv1.1, TLSv1.2]

  # Optional SSL configuration options. SSL is off by default.
  # List of root certificates for HTTPS server verifications
  #ssl.certificate_authorities: ["/etc/pki/root/ca.pem"]

  # Certificate for SSL client authentication
  #ssl.certificate: "/etc/pki/client/cert.pem"

  # Client Certificate Key
  #ssl.key: "/etc/pki/client/cert.key"

  # Optional passphrase for decrypting the Certificate Key.
  #ssl.key_passphrase: ''

  # Configure cipher suites to be used for SSL connections
  #ssl.cipher_suites: []

  # Configure curve types for ECDHE based cipher suites
  #ssl.curve_types: []

  # Configure what types of renegotiation are supported. Valid options are
  # never, once, and freely. Default is never.
  #ssl.renegotiation: never

Take a look at the ones that are uncommented: timeout, backoff.init and backoff.max seem like the thing you are looking for.

Best regards

Hi Mario, we tried the suggested settings already but found it did not work.

Hi @Leo_Kwok but what's exactly what you are trying to achieve? I mean, is the problem that too many log messages are being written?

Can you explain what is to work in your words, because maybe I'm looking into the wrong direction.

Hi Mario, we observed two symptoms of the issue:

  1. Many log messages shown above are written in filbeat log
  2. Connection growth in the server as it keep doing DNS query

I expected the back pressure behavior should function like that of logstash output.

Hi Mario, I just found that Filebeat 6.4 solved the issue. Do you know the fix detail? Thanks.

I'm glad that the problem is solved.

I'm not sure about the fix detail, just peeking at the changelog I found this but it's from 2016

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.