Kafka output, CPU spikes if Kafka Cluster Unavailable


#1

I have been testing Winlogbeat by dropping event log output directly into a Kafka cluster queue. It works very well with uninterrupted/reliable network connectivity. However in my test case I wanted to see the impact if the Kafka cluster was not available and how the client device behaved.

Unfortunately, if the Kafka cluster is not available the process spikes CPU usage resulting in rapid output of the log data displayed below. The default timeouts/retries for the kafka output seemed reasonable enough that I did not specifically modify the values.

Windows 7 Client Operating System

Client Config

winlogbeat:
  registry_file: C:/ProgramData/winlogbeat/.winlogbeat.yml
  event_logs:
    - name: Application
      ignore_older: 72h
    - name: Security
    - name: System
output:
  kafka:
    hosts: testkafka01:9092
    topic: winlogbeat
shipper:
logging:
  to_files: true
  files:
    path: C:/ProgramData/winlogbeat/Logs
    rotateeverybytes: 10485760 # = 10MB
  level: info

Logs

2016-05-01T12:08:05-04:00 INFO EventLog[System] Successfully published 58 events
2016-05-01T12:08:05-04:00 INFO EventLog[Security] Successfully published 54 events
2016-05-01T12:08:05-04:00 INFO EventLog[Application] Successfully published 59 events
2016-05-01T12:08:05-04:00 INFO EventLog[System] Successfully published 8 events
2016-05-01T12:08:05-04:00 INFO EventLog[Security] Successfully published 30 events
2016-05-01T12:08:05-04:00 INFO EventLog[Application] Successfully published 52 events
2016-05-01T12:09:46-04:00 INFO EventLog[Application] Successfully published 2 events
2016-05-01T12:13:07-04:00 INFO EventLog[Application] Successfully published 2 events
2016-05-01T12:13:30-04:00 INFO EventLog[Application] Successfully published 1 events
2016-05-01T12:13:31-04:00 INFO EventLog[Security] Successfully published 2 events
2016-05-01T12:13:31-04:00 INFO EventLog[Application] Successfully published 3 events
2016-05-01T12:14:02-04:00 INFO EventLog[Application] Successfully published 2 events
2016-05-01T12:15:31-04:00 WARN producer/broker/[0 826814776496] state change to [closing] because %!s(MISSING)
2016-05-01T12:15:31-04:00 WARN Closed connection to broker [testkafka01:9092]

Repeated Log Entries

2016-05-01T12:15:31-04:00 INFO Error publishing events (retrying): write tcp 172.16.93.141:49235->172.16.93.140:9092: wsasend: An existing connection was forcibly closed by the remote host.
2016-05-01T12:15:31-04:00 INFO Error publishing events (retrying): write tcp 172.16.93.141:49235->172.16.93.140:9092: wsasend: An existing connection was forcibly closed by the remote host.
... (repeats until process killed)

(Andrew Kroh) #2

There are some on going fixes being made to the Kafka output by @steffens. This is probably the same issue.


(Steffen Siering) #3

Yes, same issue. Fixed and hopefully available in alpha2 soonish.


#4

Thank you! I look forward to trying to alpha2. Any idea when soonish will be from a timing perspective?


#5

Alpha2 resolved this issue in two of our test environments.


(Steffen Siering) #6

So is problem resolved?


#7

Yes!


(Andrew Kroh) #8