I have an environment that has pretty high Windows EPS and am looking at best how to alleviate bottlenecks in processing Windows Event Logs. Overall, I'm getting around 100k EPS Windows events sustained on average, with spikes of up to 300k EPS, and I have the events getting collected to ~25 WEF Collectors where we have installed Winlogbeat agents. We are also filtering unnecessary events at the GPO level so they are not part of the event counts mentioned.
It looks like the Winlogbeat agent is only able to consume/send so much at a time, we are hitting around 2k EPS. I've found other tips I am going to try out to tune these, but I think we'll still hit a limit at some point. Any tips on diagnosing where the 'bottleneck' is in Winlogbeat (don't know if it can't read fast enough or can't send fast enough, but network/downstream Logstash don't seem to be bottlenecks).
We are looking at potentially installing multiple Winlogbeat agents/services on each host to each consume different chunks of the events to ensure Winlogbeat itself isn't the bottleneck. Looking things over, it looks like it's possible for example to configure each one to consume specific event ID or particular channels, like have one consume Security channel, the other 2 consume System/Application; or have 1 consume the top 3 event codes, the other consume the rest.
Does anyone have experience running multiple Winlogbeat agents on a single host? Should I duplicate the full package with separate program folders for the executable is separate? Do I need to change the service name so they are different? And any help on diagnosing where in the Winlogbeat forwarding process bottlenecks can occur would be helpful.