Winlogbeat performance issue


We forward all our very busy Active Directory Event logs to a single host, I then am trying to read these forwarded event logs using winlogbeat but I seem to hit a problem that it's just ignoring messages. I have no issues on the elastic side. It basically seem to be ignoring the recent updates and occasionally publishes 50 or so. I've been tailing the logs but this is not giving me any useful information about why it's not processing events.

I've tried differing configs, different outputs nothing seem to resolve my issue.

Windows version: Server 2016

Elasticsearch version: 6.3.2

winlogbeat version: 6.3.2



  • name: ForwardedEvents
    ignore_older: 4h
    batch_read_size: 1024

events: 32736 "winlogbeat"
setup.template.pattern: "ad-log-*"
service: AD
fields: ["[host]","event_data.ProcessName","event_data.TransmittedServices","_score","event_data.LogonGuid","provider_guid", "event_data.KeyLength","event_data.ProcessId","event_data.TargetLogonGuid","source_name", "record_number", "thread_id", "process_id","event_data.TargetUserSid","event_data.TargetSid","event_data.ServiceSid"]
enabled: true
hosts: ["elk6-data1:9200","elk6-data2:9200","elk6-data3:9200","elk6-data4:9200","elk6-data5:9200","elk6-data6:9200"]
index: "ad-log-%{+yyyy.MM.dd}"
bulk_max_size: 0

name: winlogbeat
rotateeverybytes: 10485760 # = 10MB
keepfiles: 2




Winlogbeat reads messages from Event Log which returns events in batches. Then the publisher does not create new smaller batches, it simply forwards batches as is. In this case bulk_max_size is irrelevant and does not do anything to the publisher pipeline.

I've tried it with and without bulk_max_size it makes little difference, this does not explain why it's ignoring new events and yet if I run it from the command line it happily read the old events, but when it gets up to the new events it stalls.

So the new events doesn't make it to the output?

There are several things you can try.

You can enable debug logging. Two of the more useful debug selectors related to the reading of events logs are eventlog and eventlog_detail. Each time it asks Windows for more events it will report how many it received.

logging.level: debug
  - eventlog
  - eventlog_detail # You might want to remove this one if it's too much output.

It would be helpful if you could share the Winlogbeat log output that you have. By default it logs a set of metrics every 30s that contain some information about publishing to ES as well as metrics from Winlogbeat.

BTW It can actually be counter-productive to set batch_read_size: 1024 if the messages it's receiving are large because an error can result and it must re-read with a smaller batch size. You can monitor for this behavior by checking the metrics that are logged. There is a read_errors object in the metrics output that contains a mapping of error name/number (1734 in this case) to an occurrence count. If you are hitting those errors then a batch_read_size of 512 might be better.

Another thing you can try is to replace the Elasticsearch output with a File output and see if the blocking occurs. If so then it's probably back-presssure from ES that causes reading to stop once the queue memory fills up.

Hopefully this gets you a little further in understanding and debugging your issue.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.