Winlogbeat v5 fail to start after upgrade


#1

Hi Sir,
I installed winlogbeat 1.1.2 on windows server 2012 R2, since it could not generate event_data field ,I uninstalled the winlogbeat service using uninstall-service-winlogbeat.ps1,and replaced it with winlogbeat v5 ,but when I ran the winlogbeat in the command window,it failed to start:

then I tried to register winlogbeat using powershell and started it,also failed:

I don't know why,anyone could help me? thank you so much.


(Andrew Kroh) #2

It looks like there is an issue with the contents of your winlogbeat.yml configuration file. Is there anymore to the error message? Maybe restart from a clean configuration file. Make sure there are no tabs in it.


#3

Hi Andrew,
I redownload the winlogbeat-5.0.0-alpha1-windows-64.zip from elastic.co,and just disable the elasticsearch output section using "#" ,enable logstash section and modify the logstash destination IP and port, run again,also failed:

"C:\ProgramData\winlogbeat\Logs\winlogbeat" log file shows:

2016-04-16T09:37:44+08:00 INFO State will be read from and persisted to C:\ProgramData\winlogbeat.winlogbeat.yml
2016-04-16T09:37:44+08:00 INFO Setup Beat: winlogbeat; Version: 5.0.0-alpha1
2016-04-16T09:37:44+08:00 INFO GeoIP disabled: No paths were set under shipper.geoip.paths
2016-04-16T09:37:44+08:00 INFO Max Retries set to: 3
2016-04-16T09:37:44+08:00 INFO Activated logstash as output plugin.
2016-04-16T09:37:44+08:00 INFO Publisher name: SFCDC01
2016-04-16T09:37:44+08:00 INFO Flush Interval set to: 1s
2016-04-16T09:37:44+08:00 INFO Max Bulk Size set to: 2048
2016-04-16T09:37:44+08:00 INFO winlogbeat cleanup
2016-04-16T09:37:44+08:00 INFO Dumping runtime metrics...
2016-04-16T09:37:44+08:00 INFO cmdline=["winlogbeat.exe"]
2016-04-16T09:37:44+08:00 INFO dropReasons={}
2016-04-16T09:37:44+08:00 INFO ignoredEvents={}
2016-04-16T09:37:44+08:00 INFO libbeatEsPublishEventsCallCount=0
2016-04-16T09:37:44+08:00 INFO libbeatEsPublishedAndAckedEvents=0
2016-04-16T09:37:44+08:00 INFO libbeatEsPublishedButNotAckedEvents=0
2016-04-16T09:37:44+08:00 INFO libbeatKafkaPublishEventsCallCount=0
2016-04-16T09:37:44+08:00 INFO libbeatKafkaPublishedAndAckedEvents=0
2016-04-16T09:37:44+08:00 INFO libbeatKafkaPublishedButNotAckedEvents=0
2016-04-16T09:37:44+08:00 INFO libbeatLogstashPublishEventsCallCount=0
2016-04-16T09:37:44+08:00 INFO libbeatLogstashPublishedAndAckedEvents=0
2016-04-16T09:37:44+08:00 INFO libbeatLogstashPublishedButNotAckedEvents=0
2016-04-16T09:37:44+08:00 INFO libbeatMessagesDropped=0
2016-04-16T09:37:44+08:00 INFO libbeatMessagesInWorkerQueues=0
2016-04-16T09:37:44+08:00 INFO libbeatPublishedEvents=0
2016-04-16T09:37:44+08:00 INFO msgFileCacheStats={}
2016-04-16T09:37:44+08:00 INFO publishedEvents={}
2016-04-16T09:37:44+08:00 INFO uptime={"server_time":"2016-04-16T01:37:44.2401025Z","start_time":"2016-04-16T01:37:44.1151009Z","uptime":"125.0016ms","uptime_ms":"125001"}
2016-04-16T09:37:44+08:00 CRIT Exiting: yaml: control characters are not allowed

how can I send the winlogbeat.yml to you for advanced analysis?
thank you so much .


(Andrew Kroh) #4

Please post your config file here or post a http://pastebin.com link to it.


#5

@andrewkroh

Hi Andrew,
the winlogbeat.yml as follows:
http://pastebin.com/kmLjzQU7


(Andrew Kroh) #6

I didn't find any issues with the config you posted. Maybe the invalid characters were lost in the copy/paste to pastebin.

Try creating a new minimal configuration file using the configuration from the Getting Started guide as your starting point.


(Andrew Kroh) #7

While looking more closely at the log message you posted, I suspect the error is actually related to your registry file and not your config. Look at your registry file.


#8

@andrewkroh

Great!!!:grinning:
I delete C:\ProgramData\winlogbeat,and successfully to start winlogbeat.
thank you so so much~


(Rob Leonard) #9

We have multiple clients installed have noticed nearly each time the service is not allowed or capable of a graceful shutdown (Power failure, system crash, etc) it seems to corrupt the registry file and requires it be deleted to allow the service to start up again afterwards.


(Andrew Kroh) #10

I haven't seen this myself yet. Are you able to reliably reproduce the behavior?

Having debug logs from the process would be helpful so we could see what it was doing when the system stopped/crashed. And also having the corrupt file would be useful too. Maybe we can get some clues from it.

While the process is running it periodically flushes its state to disk if there were changes to the internal state. It does this after 10 updates or after 5 seconds, whichever comes first. It writes the state information to .winlogbeat.yml.new, closes the file, then moves it to .winlogbeat.yml. In this way, if the process dies while writing to the file it shouldn't corrupt the .winlogbeat.yml file. It would have to die while in the MoveFileEx system call to cause problems (but hopefully the move operation is atomic in the filesystem).

On shutdown it performs one final flush to disk if there are any changes that had not been written.


Error:control characters are not allowed
Corrupt winlogbeat.yml checkpoint file
(Rob Leonard) #11

Sadly, no we don’t have a way to reproduce it reliably per say and we ended up doing a mass delete of the reg files to get things going.

Short version, one of our buildings is under construction that building received multiple power blips over the last few weeks. Each time, systems not on a UPS exhibited the same behavior. During our troubleshooting I did snag a copy of the reg file (attached here). In case it doesn’t make it through, numerous (~580) “NUL” chars were written to the file rendering the service incapable of starting returning only the “control characters are not allowed” message.

-Rob


(system) #12

This topic was automatically closed 21 days after the last reply. New replies are no longer allowed.


(Andrew Kroh) #13

(Andrew Kroh) #14

@rgleonard I think I tracked down the cause of this issue and have a fix prepared for it. The details are in https://github.com/elastic/beats/issues/2313#issuecomment-244211872 (spoiler alert: it's the file system cache).


(system) #15