[Originally posted this on the Elasticsearch forum (Throttling processing of log backfill), but it was suggested that it was more of a Filebeat or Logstash question. Sorry for the cross posting.]
I've done my own research on this, but have not yet found a clear solution. Hoping someone can advise.
We have a very simple single-node ELK server and a client with
Filebeat (Filebeat -> LS -> ES). We don't anticipate a high volume
(even in Production).
Problem is, when initially starting the server, we want to "backfill"
it with a few months worth of logs (say 600+ 1MB daily log files of
various types). Filebeat takes off running, loading as many harvesters
as it can, and floods the ELK server as if there's no tomorrow. LS seems
to keep up OK, but ES gets overwhelmed pretty quickly (returning 429 /
"rate limiting" errors constantly during the backfill operation). Though
it appears LS will keep trying until successful, I've seen evidence
that a lot of messages are getting lots (and never making it into ES).
On the one hand we could attempt to size and configure the server to
support this initial flood, but that doesn't seem appropriate (since
this is a once-off operation; if it takes a few hours to catch up, no
How can we safely process a significant "backlog" of files -- once
off -- on a modest server, having the various components "throttle"
traffic to prevent overwhelming ES (which seems to result in errors and missing documents).