Winlogbeat and huge Windows Security Log

(James Watson) #1

I'm on latest 5.0 Elastic version of Elastic/Logstash/Kibana/Beats and successfully using Winlogbeat to send multiple Windows server security logs to Logstash which outputs to Elasticsearch which are successfully viewable in Kibana, etc.

I'm now having issues sending the security log from a domain controller using the same configuration. Winlogbeat runs, but the process immediately begins consuming between 15-20% of cpu continuously and no events are ever visible in Kibana. I've used both the logstash and elasticsearch outputs in the winlogbeat.yml but the results have been the same.

Looking at the DC's I noticed that sometime in the past the security log size has been increased from default to 5G (sigh, don't ask). Since I'm having no issues with Windows servers using default security logs, I'm suspecting this is related.

So now I'm looking for ways to first confirm if this suspicion is correct and then maybe adjust my configuration to overcome this? I've already tried:

  - name: Security
    ignore_older: 1h

in the winlogbeat.yml

There is no indication in logstash-plain.log of connection reset (which does happen sometimes if I don't filter by ignore_older: 1h) and the local winlogbeat log file reports a continuous stream of:

> 2016-11-03T08:56:40-05:00 INFO EventLog[Security] Successfully published 100 events
> 2016-11-03T08:56:41-05:00 INFO EventLog[Security] Successfully published 100 events
> 2016-11-03T08:56:41-05:00 INFO Non-zero metrics in the last 30s: libbeat.publisher.published_events=4700 libbeat.logstash.published_and_acked_events=4700 msg_file_cache.SecurityHits=4700 libbeat.logstash.publish.write_bytes=1191046 published_events.Security=4700 libbeat.logstash.call_count.PublishEvents=47 libbeat.logstash.publish.read_bytes=282
> 2016-11-03T08:56:41-05:00 INFO EventLog[Security] Successfully published 100 events
> 2016-11-03T08:56:42-05:00 INFO EventLog[Security] Successfully published 100 events

Any suggestions for how to troubleshoot/proceed?

(Andrew Kroh) #2

@jameswatson3 Based on the logs you posted it would appear that the data is flowing successfully to Logstash. Could there be some problem in Logstash?

You could test Winlogbeat with the file output instead of the Logstash output just to verify it is collecting data (but the logs kind of confirm that it is). Another way to get some insights from Winlogbeat is to run it with -httpprof localhost:6060 then browse http://localhost:6060/debug/vars. This will show some metrics from Winlogbeat in JSON format. There are stats from the libbeat outputs, number of events read by Winlogbeat, etc.

You can also increase the log level to debug in your config file. (Warning: this will produce a lot of data).

While you are testing you may need to delete the Winlogbeat registry file at C:\ProgramData\winlogbeat\.winlogbeat.yml to get Winlogbeat to re-read events.

Connection resets occur when there is congestion in the Logstash pipeline. It could be backpressure from ES. I recommend setting the congestion_threshold to a very high value so that the connection doesn't get reset (essentially disabling it). The protocol between Beats and Logstash handles the back-pressure so there is no need to reset the connection on congestion.

(James Watson) #3

@andrewkroh ,

Thank you very much for taking the time to provide such detailed advice. As you observed, data was flowing but there was apparently significant overhead related to the size of the log in our case which was causing my confusion. Luckily we are going to be able to address that root cause and I fully expect Beats to begin performing in the fashion I'm used to on other systems.

Thanks again. I'm new to this suite of products and am really enjoying my experience so far given the quality of the product, documentation and community.

(system) #4

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.