Need help tuning 3K+ EPS with ForwardedEvents

I'm running winlogbeats on a windows 2012 server that is used as a windows event forwarding aggregation point. 2700+ clients report security logs to this 1 server in near real-time with the built in windows event forwarding process enforced by GPO.

The issues I'm facing are performance with getting winlogbeat to read the ForwardedEvents fast enough. My server is getting close to 3,000 EPS while I can only get about 1,300 EPS out of winlogbeats > logstash.

More details:

My receiving point is logstash running a beats input with metrics filter, output is to null {} besides metrics. I can send multiple servers to this one logstash server and run close to 5K EPS ingest while staying below 20% cpu. This in my opinion proves that my logstash input isn't a choke point.

My winlogbeat win2012 server has 8 cores with 32 gb ram and typically sits at 20% CPU with everything running and very little ram usage. My logstash receiving server is 4 cores with 16 gb ram.

For the winlogbeat config:

#======================= Winlogbeat specific options ==========================
winlogbeat.event_logs:
  - name: ForwardedEvents
    batch_read_size: 1024

#================================ General ======================================
name: winlogbeat

fields:
  environment: prod
  domain: mydomain
  devicetype: clients

# Internal queue size for single events in processing pipeline
queue_size: 10000

# Sets the maximum number of CPUs that can be executing simultaneously. The
# default is the number of logical CPUs available in the system.
max_procs: 8

#================================ Outputs =====================================

#----------------------------- Logstash output --------------------------------
output.logstash:
  # The Logstash hosts
  hosts: ["logstash-hostname:5044"]
  compression_level: 1
  worker: 4
  pipelining: 10

My Issues:

  1. Getting winlogbeat to keep up to 3k EPS on a single box with plenty of headroom
  2. Getting winlogbeat to ignore event record ID bookmarks for FowardedEvents

Besides performance issues, event record id bookmarks are going to be an issue. I have over 2,000 devices reporting security logs to 1 server where the record id increments per unique device. If winlogbeat uses the last event as a bookmark, it would likely ignore events that need to be processed. This may also be part of the reason why I can't read the full 3K EPS however if this was the only reason I'd expect my EPS rate to be significantly less than 1,300.

Any help that could be provided would be greatly appreciated.

Can you please try adding forwarded: true to your config file and see if this makes an improvement. This requires that the event subscription uses RenderedText as its format.


The last time I did any benchmarking, the most events I could squeeze out was ~2000 events per second from a single event log. See https://github.com/elastic/beats/pull/3118#issuecomment-265310544.

I profiled the code and the slowest function (by far) was the call into Windows to get the event. So there isn't much more to optimize in the reader path. So parallelizing some of the work would probably provide the most benefit.

A single goroutine (a lightweight thread) is used by each event log reader. It reads the events, sends them to Logstash, and waits for Logstash to ACK them. In 6.x the publishing is moved to a separate thread so this should provide some performance improvements, but I haven't tested them personally. The config options that you can tune are specified in the 6.x config file. You can download 6.x alpha/beta releases from the website.


Regarding the event ID issue, I don't think this should have any affect on performance, but it will come into play on restart. There is an open issue for this. https://github.com/elastic/beats/issues/3731

Thanks for the good info.

Since I'm stuck with slow API calls is there a way to tune how often it does the calls, how many can be done in parallel, or the largest batch sizes it can do?

I currently have over 45 servers that I need winlogbeats on and each server is seeing ~3,000 EPS on average Some will peak to 24k and calm down a few minutes later.

My next questions are what are the alternatives that can keep up? Is there an option to skip the API calls and read the logs on disk? If winlogbeats or filebeats had an option to read the binary evtx files that shouldn't be bottle necked by the API calls anymore.

I see the enhancement request to read the evt/x files:

I'm still going to try to tune winlogbeats as is and hope that evtx comes soon. 2,000 EPS max will never work in our production environment when other tools can do 10K+ but I might be able to scale out to more servers to try to bring my per box EPS count down to around 2K. If I do use beats in production, I might have to target 1K EPS to give me some room for spikes.

winlogbeat.event_logs:
  - name: ForwardedEvents
    forwarded: true

Try enabling forwarded and measure your EPS.

I haven't been able to see any changes by enabling this for the "ForwardedEvents" however according to online docs, it looks like it's on by default.

From: https://www.elastic.co/guide/en/beats/winlogbeat/current/configuration-winlogbeat-options.html

event_logs.forwarded
A boolean flag to indicate that the log contains only events collected from remote hosts using the Windows Event Collector. The value defaults to true for the ForwardedEvents log and false for any other log

Indeed it is enabled by default. I confirmed it in the code. I had forgot it was smart enough to enable it automatically.

Are you able to test 6.0.0-beta1 snapshot build? It would be interesting to know if the publishing changes that were made improve the performance

Sure I don't mind testing it out.

Do you have a sample config with the options I should be using? Or should I just copy my existing from 5.5?

Here is what I'm using:

###################### Winlogbeat Configuration Example ##########################

winlogbeat.event_logs:
  - name: ForwardedEvents
    batch_read_size: 1024
#================================ General =====================================

name: winlogbeat

fields:
  environment: stage
  domain: testdomain
  devicetype: clients

#max_procs: 8  


#================================ Outputs =====================================

#----------------------------- Logstash output --------------------------------
output.logstash:
  # The Logstash hosts
  hosts: ["logstash:5044"]
  compression_level: 1
  #worker: 2
  #pipelining: 2

I don't know what to do with workers, pipelining and max_procs but with 5.5 I was seeing 1300 EPS max with this config and when running the updated 6.0 I'm seeing near 2K EPS.

My box has 8 cores, 16GB of RAM and I'm even close to using all the resources. So if I can crank this up a little I will. The receiving Windows box is getting 2,700 EPS with spikes of 10K+

You could try tweaking the queue.mem.events value. I think it defaults to 4096, maybe try setting it to something really large and see if it goes above 2K EPS.

For reference this is the full config with comments: https://github.com/elastic/beats/blob/c0af60c0fb39eae3d90f2ff9403a967d572046af/libbeat/_meta/config.reference.yml#L25-L43

queue.mem.events: 32736

Update:

Lowering batch read size seemed to have a good improvement, also increasing the queue.mem.events as you have above seemed to help too. Other things I changed after improvements have been made is setting compression higher, blulk size for logstash to be the same as beats and adding workers and piplining. I don't really know enough about workers and piplining to know if this is a good idea or not, but I'm now running close to 2700 EPS and my ingest rate is 2,900 EPS. Getting really close, so if you have any more tips on Piplining, batches, or workers that would be helpful. Compression seems to have the greatest impact on throughput so far but I've only done a half day of testing, 0 compression kills EPS and gradual increases seems to have better EPS for each level increase up to 3.

winlogbeat.event_logs:
  - name: ForwardedEvents
    batch_read_size: 512

queue.mem.events: 32736

output.logstash:
  # The Logstash hosts
  hosts: ["logstash:5044"]
  compression_level: 3
  bulk_max_size: 512
  worker: 2
  pipelining: 4
1 Like

This sounds like part of the bottleneck is the network and/or logstash. If you have winlogbeat write to a file or console instead of logstash, you get a better picture of number of events winlogbeat can process. You will find some metrics written to the log file every 30 seconds. Checking the metrics (check this PR for list of interesting metrics) for pipeline.events.active (via http) you will see how many events are currently enqueued by winlogbeat. If the number is somewhat close to queue.mem.events, it's the output holding back throughput. If the number is "closer" to 0 the bottleneck is the windows API.

In 6.0 the default pipelining size is 5. pipelining publishes another batch of events to logstash, while it's waiting for ACK. Having a value of 4 or 5 ensures this many batches are in-flight, reducing potential network, encoding and compression latencies. When increasing the workers also enable load balancing (loadbalance: true), so the sub-batches of events are load-balanced and published in parallel. Load balancing is disabled by default. => right now worker: 2 is not active.

Having pipelining: 4, worker: 2 and bulk_max_size: 512, up to 4096 events will be in-flight to logstash. Whenever logstash ACKs one batch, another batch of up to 512 will be published.

Start winlogbeat with -httpprof 127.0.0.1:6060 and point your browser (or script) to http://localhost:6060/debug/vars to get the live metrics from winlogbeat.

1 Like

Switched the output to write to a local file only and I'm writing about 2400 EPS to local C: drive that is an SSD.

c:\winlogbeat\winlogbeat-6.0.0-beta1-SNAPSHOT-windows-x86_64\logs> ((gc .\winlogbeat  | select-string "2017-07-25T14:50" | select-string -notmatch "Non-zero metrics" | %{$_ -replace "^.*published\s", "" -replace "\sevents$",""} | measure-object -sum).sum/60)

2406.4

Granted this EPS metric is for 1 minute only, however I ran this over a larger sample to confirm it's accurate. I wouldn't expect it's an output limitation at this point, I think what Andrew was saying would be correct that it's a API limitation, however I don't understand the code to come to that conclusion. I'll just have to wait it out until reading the evtx files are an option since current and 6.0b versions can't keep up with our throughput needs.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.