Windows metricbeat service stops at random

Hi,

I'm currently deploying my beats for my new cluster and I have the following setup:
Metricbeat > logstash (port 5044) > elastic. I'm using logstash because I want to introduce a Kafka pipeline, but that isn't important here.

However the metricbeat (7.4.2) service keeps stopping at random times, but most notably when the logstash service is restarted. It seems if the metricbeat service looses contact with the logstash service it stops.

Currently I have "fixed" it by automatically restarting the service, this is however not really favorable. Is anyone aware of this problem? I have installed metricbeat on 7 different hosts. Both windows 2016 and 2012R2. All instances are configured at the same cloudprovider and latency is really low. Metricbeat on Linux doesn't show these problems as of yet.

hi @pdeelman, have you noticed any recurring behavior in the logs before metricbeat stops?
Can you enable debug logging and provide us with the logs around that time?

Hi @MarianaD,

I'm currently running metricbeat with debug logging enabled. However my metricbeat refuses to crash right now. I have changed a few things in the past days of experimenting and I'm tracing back my steps to see where it fails. In my previous setup I did not have enough logging information available to deduct any issues.

I'll report back as soon as I have an update. Right now I can't mess too much with the current nodes, since they are running production. So I might have to setup a few experimental nodes. This is pretty frustrating to be honest .....

Hi @MarianaD, metricbeat finally crashed (never would expect to be happy about it). The debug log is as follows. It seems to be some kind of race condition statting the filesystem. I'll keep my debug running for now to see if it crashed on me again. The full dump is available at: https://pastebin.com/HqfQrivn

2019-11-21T17:04:42.151+0100 DEBUG [system.fsstat] fsstat/fsstat.go:86 filesystem: C:\ total=171272302592, used=39741087744, free=131531214848
fatal error: concurrent map iteration and map write
2019-11-21T17:04:42.152+0100 DEBUG [system.fsstat] fsstat/fsstat.go:86 filesystem: D:\ total=8626176, used=8626176, free=0

@MarianaD BTW, I can confirm the crash happening again. I've lowered the metric collection for fsstat to 10seconds and now it crashes pretty reliably once every few hours. Same crash log, so I won't bother pasting that.

Please let me know if you need any additional information.

@pdeelman, thank you for reproducing and debugging the issue, I suggest at this point opening an issue in the beats repo (https://github.com/elastic/beats) so we can follow up on this.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.