Hello All,
So I've been running the full stack (metricbeat, logstash, elasticsearch, kibana) on 7.2, all on Linux except that most of the metricbeat clients are on windows machines. I have beat-logstash connections encrypted with tls, and most of the time it works fine except that metricbeat every couple of days just terminates. I have no idea why. For example I'm running metricbeat on a windows 10 machine, and the logs locally have nothing to say. The last few lines:
|2019-08-02T19:04:36.642-0500|INFO|[monitoring]|log/log.go:145|Non-zero metrics in the last 30s|{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":251843,"time":{"ms":47}},"total":{"ticks":708046,"time":{"ms":204},"value":708046},"user":{"ticks":456203,"time":{"ms":157}}},"handles":{"open":325},"info":{"ephemeral_id":"3c30c004-b106-4923-885e-5b74f52c9df6","uptime":{"ms":202710851}},"memstats":{"gc_next":10458160,"memory_alloc":7656952,"memory_total":87723342944,"rss":8192},"runtime":{"goroutines":39}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":54,"batches":6,"total":54},"read":{"bytes":210},"write":{"bytes":18215}},"pipeline":{"clients":3,"events":{"active":0,"published":54,"total":54},"queue":{"acked":54}}},"metricbeat":{"system":{"cpu":{"events":3,"success":3},"filesystem":{"events":2,"success":2},"fsstat":{"events":1,"success":1},"memory":{"events":3,"success":3},"network":{"events":9,"success":9},"process":{"events":30,"success":30},"process_summary":{"events":3,"success":3},"socket_summary":{"events":3,"success":3}}}}}}|
|---|---|---|---|---|---|
|2019-08-02T19:05:06.647-0500|INFO|[monitoring]|log/log.go:145|Non-zero metrics in the last 30s|{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":251875,"time":{"ms":32}},"total":{"ticks":708109,"time":{"ms":63},"value":708109},"user":{"ticks":456234,"time":{"ms":31}}},"handles":{"open":325},"info":{"ephemeral_id":"3c30c004-b106-4923-885e-5b74f52c9df6","uptime":{"ms":202740854}},"memstats":{"gc_next":10493824,"memory_alloc":7585720,"memory_total":87735873120},"runtime":{"goroutines":39}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":47,"batches":6,"total":47},"read":{"bytes":210},"write":{"bytes":17004}},"pipeline":{"clients":3,"events":{"active":0,"published":47,"total":47},"queue":{"acked":47}}},"metricbeat":{"system":{"cpu":{"events":3,"success":3},"memory":{"events":3,"success":3},"network":{"events":9,"success":9},"process":{"events":26,"success":26},"process_summary":{"events":3,"success":3},"socket_summary":{"events":3,"success":3}}}}}}|
|2019-08-02T19:05:36.645-0500|INFO|[monitoring]|log/log.go:145|Non-zero metrics in the last 30s|{"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":251953,"time":{"ms":78}},"total":{"ticks":708218,"time":{"ms":109},"value":708218},"user":{"ticks":456265,"time":{"ms":31}}},"handles":{"open":325},"info":{"ephemeral_id":"3c30c004-b106-4923-885e-5b74f52c9df6","uptime":{"ms":202770851}},"memstats":{"gc_next":10496816,"memory_alloc":7687672,"memory_total":87749289856,"rss":-184320},"runtime":{"goroutines":39}},"libbeat":{"config":{"module":{"running":0}},"output":{"events":{"acked":53,"batches":6,"total":53},"read":{"bytes":210},"write":{"bytes":18448}},"pipeline":{"clients":3,"events":{"active":0,"published":53,"total":53},"queue":{"acked":53}}},"metricbeat":{"system":{"cpu":{"events":3,"success":3},"filesystem":{"events":2,"success":2},"fsstat":{"events":1,"success":1},"memory":{"events":3,"success":3},"network":{"events":9,"success":9},"process":{"events":29,"success":29},"process_summary":{"events":3,"success":3},"socket_summary":{"events":3,"success":3}}}}}}|
Ignore the "|", I don't know what's going on with those
I looked at the windows event logs and it showed just three seconds after that last log:
I can't find anything else that would provide any useful info. I can't find any way to reproduce this, but I will turn on debug level logging for when it does happen again.
Thanks in advance for any help given.