Throttling processing of log backfill

I've done my own research on this, but have not yet found a clear solution. Hoping someone can advise.

We have a very simple single-node ELK server and a client with Filebeat (Filebeat -> LS -> ES). We don't anticipate a high volume (even in Production).

Problem is, when initially starting the server, we want to "backfill" it with a few months worth of logs (say 600+ 1MB daily log files of various types). Filebeat takes off running, loading as many harvesters as it can, and floods the ELK server as if there's no tomorrow. LS seems to keep up OK, but ES gets overwhelmed pretty quickly (returning 429 / "rate limiting" errors constantly during the backfill operation). Though it appears LS will keep trying until successful, I've seen evidence that a lot of messages are getting lots (and never making it into ES).

On the one hand we could attempt to size and configure the server to support this initial flood, but that doesn't seem appropriate (since this is a once-off operation; if it takes a few hours to catch up, no big deal).

How can we safely process a significant "backlog" of files -- once off -- on a modest server, having the various components "throttle" traffic to prevent overwhelming ES (which seems to result in errors and missing documents).

Thoughts?

Thanks,
Greg

I think this might be more of a logstash of filebeat question than an Elasticsearch one. There isn't a whole lot more that Elasticsearch can do. It is already providing the backpressure to logstash/filebeat. Maybe open the question on the logstash forum?

Thanks Nik. That seems reasonable. I'll do that.

Greg

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.