Are the CPUs on the Logstash hosts saturated?
The CPUs are definitely not saturated, these nodes have 24 CPUs each, and are way underutilized, rarely reaching even 100%.
If no, can you increase the number of pipeline workers?
How/where do I increase the pipeline workers? I saw a command I can run to start Logstash with a non-standard number of workers, but this should persist through reboots, service restarts etc..
What kind of event rate are you getting per host?
It's pretty poor right now -- about 500/s with 4 logstash hosts, however two of them are also dedicated ES master nodes. The messages can be pretty long & complex, but that's still a pretty shabby number. Under a similar setup, I used to be receiving 1200-1600 on average, but something must have changed to really reduce that number.
I'm somewhat convinced Logstash is the bottleneck, as the indexing latency in ES is only about 7ms. The only other potential bottleneck I can see would be the client nodes being monitored. They're pretty beefy boxes as well, with 24 CPUs and 128gb memory. I'm primarily using Filebeat on those machines and am planning to upgrade to filebeat 5.0 today. I suppose I'll also up the number of workers to 24 or so on those boxes as well.
I'd prefer to not use Redis or Kafka, as it's another point of failure and maintenance, but if these worker settings don't change much, I'll likely implement a queueing service to reduce latency.