Retrying problems - bandwidth increase

I've got a setup of a couple hundred Filebeats sending in to Redis -> Logstash -> Elasticsearch, and when Redis fills up and rejects publications the bandwidth being used starts increasing. I'm guessing this is because the Filebeats are still sending whole batches of log lines during their connections, before Redis returns a rejection, and that retrying is happening more frequently than batches are normally ready to be sent.

I'd like to reduce the bandwidth consumed during ingest queue full scenarios. It would be great if there were Filebeat parameters for retry frequency, but it looks like there aren't. I understand there's a feature in development for spooling to disk, which is great, but doesn't address this issue directly as far as I can tell.

Also, the number of connections from Filebeats to the queue host skyrockets during queue full scenarios. I'm guessing this is because publish rejections cause Filebeat to drop the connection and to reconnect for retries. It would be great if I could stop that from happening because it results in a proliferation of brief connections and currently generates an ongoing state of around 12 to 13 thousand TIME_WAITs. That's not the end of the world (or available network connections), but I'll be adding more clients...

There is a "timeout" option in the Redis output plugin that might be of some use in tweaking network connections, but the docs only say, "The Redis connection timeout in seconds. The default is 5 seconds." and don't clarify what generates timeout conditions or how the software addresses timeouts.

Relevant Filebeat config bits:

queue.mem:
  events: 4096
  # wait for at least 512 events to publish
  flush.min_events: 512
  # _or_ at most 10s
  flush.timeout: 10s
output.redis:
  timeout: 60 # default 5 -- try to keep connection on slow senders
  bulk_max_size: 2048 # default 2048

Relevant Logstash config bits:

input {
  redis {
    ..
    threads => 8
    batch_count => 125 # default 125
  }
}

and

pipeline.batch.size: 512
pipeline.workers: 4

I've had it recommended to me that we skip Redis entirely and just use Logstash for the ingest, and use Logstash's queueing capability. From what I gather, Logstash handles Filebeat inputs during back pressure by refusing connections, which should eliminate the excessive bandwidth consumption.

Props to Elastic support for the help.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.