Metricbeat 7.15.1 Will Not Start

ty9001 · November 4, 2021, 3:31pm

Hello! I have noticed in the past few weeks my Metricbeat 7.15.1 monitoring doesn't seem to work. I finally figured out the service won't stay started properly. I was using Fedora 34 and have currently rolled out a new server to replace the old using Fedora 35. When I run 'sudo metricbeat' I receive the following output. Has anyone else seen anything like this? There is way more information, but I'm at the character limit. Please let me know if I need to provide any more information. Thanks!

sudo metricbeat
runtime/cgo: pthread_create failed: Operation not permitted
SIGABRT: abort
PC=0x7f5516bbf85c m=7 sigcode=18446744073709551610

goroutine 0 [idle]:
runtime: unknown pc 0x7f5516bbf85c
stack: frame={sp:0x7f54ed465790, fp:0x0} stack=[0x7f54ecc661e8,0x7f54ed465de8)
00007f54ed465690:  00005646d9251e88 <runtime.adjustframe+136>  00007f54ed465a10
00007f54ed4656a0:  00007f54ed465b18  00007f54ed465b01
00007f54ed4656b0:  0000000000000000  0000000000000000
00007f54ed4656c0:  0000000000000000  0000000000000000
00007f54ed4656d0:  0000000000000000  0000000000000000
00007f54ed4656e0:  0000000000000000  00005646df7664e0
00007f54ed4656f0:  00005646000233d7  0000000000000000
00007f54ed465700:  00007f54ed465878  00005646d9267b01 <runtime.markroot.func1+385>
00007f54ed465710:  0000000000000000  00005646df7664e0
00007f54ed465720:  00005646dec920a0  00007f5400000000
00007f54ed465730:  0000000000000000  00007f5400000000
00007f54ed465740:  0000000000000000  00007f54ed465a68
00007f54ed465750:  00005646d925fe55 <runtime.gentraceback+4405>  00007f54ed465a10
00007f54ed465760:  00007f54ed465b00  00005646d9273001 <runtime.memhash32+65>
00007f54ed465770:  00007f54ed465878  00007f5516bcce59
00007f54ed465780:  0000000000000000  00007f5516bbf84e
00007f54ed465790: <0000000000000130  0000000100000000
00007f54ed4657a0:  0000000000000120  0000000000000000
00007f54ed4657b0:  0000000000000013  0000000000000000
00007f54ed4657c0:  000000c000071fd0  0000000000000004
00007f54ed4657d0:  0000003400000013  00007f5516c426f1
00007f54ed4657e0:  00007f54cffff640  00007f54ed465aa0
00007f54ed4657f0:  00007f54ed46591e  00007f54ed46591f
00007f54ed465800:  00007f54cffff640  00007f5516bbd785
00007f54ed465810:  00007f54d0000020  66f54dee55c2d700
00007f54ed465820:  00007f54ed466640  0000000000000006
00007f54ed465830:  00000000000000f1  0000000000000000
00007f54ed465840:  00005646dc71df7e  00007f5516b726b6
00007f54ed465850:  00007f5516d2c990  00007f5516b5c7d3
00007f54ed465860:  0000000000000020  0000000000000000
00007f54ed465870:  0000000000000000  00007f5516c31904
00007f54ed465880:  00007f54cf7ff000  000000000000000d
runtime: unknown pc 0x7f5516bbf85c
stack: frame={sp:0x7f54ed465790, fp:0x0} stack=[0x7f54ecc661e8,0x7f54ed465de8)
00007f54ed465690:  00005646d9251e88 <runtime.adjustframe+136>  00007f54ed465a10
00007f54ed4656a0:  00007f54ed465b18  00007f54ed465b01
00007f54ed4656b0:  0000000000000000  0000000000000000
00007f54ed4656c0:  0000000000000000  0000000000000000
00007f54ed4656d0:  0000000000000000  0000000000000000
00007f54ed4656e0:  0000000000000000  00005646df7664e0
00007f54ed4656f0:  00005646000233d7  0000000000000000
00007f54ed465700:  00007f54ed465878  00005646d9267b01 <runtime.markroot.func1+385>
00007f54ed465710:  0000000000000000  00005646df7664e0
00007f54ed465720:  00005646dec920a0  00007f5400000000
00007f54ed465730:  0000000000000000  00007f5400000000
00007f54ed465740:  0000000000000000  00007f54ed465a68
00007f54ed465750:  00005646d925fe55 <runtime.gentraceback+4405>  00007f54ed465a10
00007f54ed465760:  00007f54ed465b00  00005646d9273001 <runtime.memhash32+65>
00007f54ed465770:  00007f54ed465878  00007f5516bcce59
00007f54ed465780:  0000000000000000  00007f5516bbf84e
00007f54ed465790: <0000000000000130  0000000100000000
00007f54ed4657a0:  0000000000000120  0000000000000000
00007f54ed4657b0:  0000000000000013  0000000000000000
00007f54ed4657c0:  000000c000071fd0  0000000000000004
00007f54ed4657d0:  0000003400000013  00007f5516c426f1
00007f54ed4657e0:  00007f54cffff640  00007f54ed465aa0
00007f54ed4657f0:  00007f54ed46591e  00007f54ed46591f
00007f54ed465800:  00007f54cffff640  00007f5516bbd785
00007f54ed465810:  00007f54d0000020  66f54dee55c2d700
00007f54ed465820:  00007f54ed466640  0000000000000006
00007f54ed465830:  00000000000000f1  0000000000000000
00007f54ed465840:  00005646dc71df7e  00007f5516b726b6
00007f54ed465850:  00007f5516d2c990  00007f5516b5c7d3
00007f54ed465860:  0000000000000020  0000000000000000
00007f54ed465870:  0000000000000000  00007f5516c31904
00007f54ed465880:  00007f54cf7ff000  000000000000000d

rax    0x0
rbx    0x7f54ed466640
rcx    0x7f5516bbf85c
rdx    0x6
rdi    0xa75
rsi    0xa7b
rbp    0xa7b
rsp    0x7f54ed465790
r8     0x7f54ed465860
r9     0x7f5516ce64e0
r10    0x8
r11    0x246
r12    0x6
r13    0x0
r14    0x5646dc71df7e
r15    0x0
rip    0x7f5516bbf85c
rflags 0x246
cs     0x33
fs     0x0
gs     0x0

riferrei · November 4, 2021, 3:56pm

Not being able to create a POSIX thread (pThread) on Linux seems to be a severe problem. I would start asking you to provide some better details from your environment, such as:

CPU architecture
Operating System
Kernel Details
Metricbeat distribution

Also, could you try to execute Metricbeat this way and provide whatever is shown:

metricbeat -e -d "*"

You can paste the content in a Gist entry and provide the link here.

Thanks,

— @riferrei

ty9001 · November 4, 2021, 4:16pm

Thanks!

CPU Architecture: (Virtual Machine) AMD Ryzen 5 2400G
Operating System: Fedora 35
Kernel: 5.14.14-300.fc35.x86_64
Metricbeat distribution: 7.15.1, installed from the Elastic repo

Here is the Gist dump:

gist.github.com

https://gist.github.com/Ty9000/b1ffd35306dfc97da8a0fb6b57aeacee

gistfile1.txt

2021-11-04T12:12:25.212-0400    INFO    instance/beat.go:665    Home path: [/usr/share/metricbeat] Config path: [/etc/metricbeat] Data path: [/var/lib/metricbeat] Logs path: [/var/log/metricbeat]
2021-11-04T12:12:25.212-0400    DEBUG   [beat]  instance/beat.go:723    Beat metadata path: /var/lib/metricbeat/meta.json
2021-11-04T12:12:25.212-0400    INFO    instance/beat.go:673    Beat ID: a136c134-0572-49d4-a7d9-5d84cb9ddec2
2021-11-04T12:12:25.214-0400    DEBUG   [docker]        docker/client.go:48     Docker client will negotiate the API version on the first request.
2021-11-04T12:12:25.214-0400    DEBUG   [add_cloud_metadata]    add_cloud_metadata/providers.go:128     add_cloud_metadata: starting to fetch metadata, timeout=3s
2021-11-04T12:12:25.215-0400    DEBUG   [add_docker_metadata]   add_docker_metadata/add_docker_metadata.go:86   add_docker_metadata: docker environment not detected: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
2021-11-04T12:12:25.215-0400    DEBUG   [kubernetes]    add_kubernetes_metadata/kubernetes.go:138       Could not create kubernetes client using in_cluster config: unable to build kube config due to error: invalid configuration: no configuration has been provided, try setting KUBERNETES_MASTER environment variable    {"libbeat.processor": "add_kubernetes_metadata"}
2021-11-04T12:12:28.215-0400    DEBUG   [add_cloud_metadata]    add_cloud_metadata/providers.go:172     add_cloud_metadata: timed-out waiting for all responses
2021-11-04T12:12:28.215-0400    DEBUG   [add_cloud_metadata]    add_cloud_metadata/providers.go:131     add_cloud_metadata: fetchMetadata ran for 3.000204645s
2021-11-04T12:12:28.215-0400    INFO    [add_cloud_metadata]    add_cloud_metadata/add_cloud_metadata.go:101    add_cloud_metadata: hosting provider type not detected.

This file has been truncated. show original

riferrei · November 4, 2021, 4:41pm

How's this problem is reproducible? As in, every single time you run, it fails, or is it something that happens eventually after multiple executions and restarts?

I wonder if you are not running out of resources and, hence, unable to create POSIX threads. Could you please run ulimit -a and share the results here?

I've seen some crazy VM policies being applied to guest O.S that caps their ability to create file descriptors, processes, and open files. It is important to rule that out first.

ty9001 · November 4, 2021, 5:13pm

It's very strange - I upped the amount of RAM allocated to the machine in total and it somewhat fixes it, but not really? It now has a total of 10GB, 4 used by ES, 1.5 used by Kibana, and the rest available to the system. If I reboot the system, metricbeat starts up properly and just complains (for some reason) it can't access localhost:9200. If I restart the service it crashes every time with the huge, long error. This happens until I reboot the server and the cycle starts again.

Here is the output of ulimit -a:

real-time non-blocking time  (microseconds, -R) unlimited
core file size              (blocks, -c) unlimited
data seg size               (kbytes, -d) unlimited
scheduling priority                 (-e) 0
file size                   (blocks, -f) unlimited
pending signals                     (-i) 39385
max locked memory           (kbytes, -l) 64
max memory size             (kbytes, -m) unlimited
open files                          (-n) 1024
pipe size                (512 bytes, -p) 8
POSIX message queues         (bytes, -q) 819200
real-time priority                  (-r) 0
stack size                  (kbytes, -s) 8192
cpu time                   (seconds, -t) unlimited
max user processes                  (-u) 39385
virtual memory              (kbytes, -v) unlimited
file locks                          (-x) unlimited

riferrei · November 4, 2021, 5:40pm

In a sense, it is "good" to know that everything works if you cycle everything up. It means that it has to be with resource contention and/or starvation. Something is keeping the processes to give back to the O.S the resources they use. Unfortunately, this also means that identifying what is causing this is a bit hard.

I would start by isolating the processes. If possible, leave only Metricbeat running in this VM and move Elasticsearch and Kibana to others. Se how this affects resource consumption. Then, keep an eye on the status of ulimit -a as you play with Metricbeat. I would start looking specifically for the "open files" field — which currently is capped to 1024.

See if you can isolate what causes this starvation and we can move from there.

— @riferrei

ty9001 · November 4, 2021, 10:18pm

I created a brand new server, Fedora 35, and set up the Elastic repo and installed only Metricbeat. That server has 4 cores, 4GB RAM, so it's got plenty of resources for just Metricbeat. It looks a little better, but I caught it a few times with similar errors:

gist.github.com

https://gist.github.com/Ty9000/ea107eee26bc50285118626d60ae324e

gistfile1.txt

Nov  4 14:06:13 fedora metricbeat[1084]: 2021-11-04T14:06:13.235-0400#011INFO#011cfgfile/reload.go:227#011Dynamic config reloader stopped
Nov  4 14:06:13 fedora metricbeat[1084]: 2021-11-04T14:06:13.235-0400#011INFO#011[reload]#011cfgfile/list.go:129#011Stopping 3 runners ...
Nov  4 14:06:13 fedora metricbeat[1084]: runtime/cgo: pthread_create failed: Operation not permitted
Nov  4 14:06:13 fedora metricbeat[1084]: SIGABRT: abort
Nov  4 14:06:13 fedora metricbeat[1084]: PC=0x7f498f4ca85c m=6 sigcode=18446744073709551610
Nov  4 14:06:13 fedora metricbeat[1084]: goroutine 0 [idle]:
Nov  4 14:06:13 fedora metricbeat[1084]: runtime: unknown pc 0x7f498f4ca85c
Nov  4 14:06:13 fedora metricbeat[1084]: stack: frame={sp:0x7f4966f227d0, fp:0x0} stack=[0x7f49667231e8,0x7f4966f22de8)
Nov  4 14:06:13 fedora metricbeat[1084]: 00007f4966f226d0:  0000000000000000  0000000000000000
Nov  4 14:06:13 fedora metricbeat[1084]: 00007f4966f226e0:  0000000000000000  00005592c6d664e0

This file has been truncated. show original

stephenb · November 5, 2021, 12:34am

So maybe I am missing something completely but Fedora is not even a supported OS according to our support matrix here but I understand they are very similar to CentOS

Curious if you have tried starting both with and without sudo?

Which modules have you enabled?

ty9001 · November 5, 2021, 12:21pm

Yes, sir - Fedora is very similar to CentOS and RHEL.

Without sudo, I am unable to start the service at all due to permissions - sudo is required for services to be started or stopped for my system.

No modules enabled, save for what is default.

system · December 3, 2021, 12:21pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Metricbeat service get stopped after service start Metrics	7	629	July 20, 2023
Metricbeat on RHEL 9 / Alma 9 / Rocky 9 Beats metricbeat	4	738	January 7, 2023
Metricbeat on one of my Windows servers fails to start Beats metricbeat	7	2456	December 14, 2016
Metricbeat 7.9.2 service doesn't start on windows 2019, but fine on linux Beats metricbeat	19	2529	December 16, 2020
Metricbeats does not start Win2012 R1 x64 Beats	5	488	March 10, 2017

Metricbeat 7.15.1 Will Not Start

Related topics