Howdy! We're using the Elastic Agent's synthetic agent running in docker to monitor some web properties. There's no visible issues or errors, but after some time the agent "dies" for lack of a better term. After some investigation, it looks like the container is running out of disk space because the
/usr/share/elastic-agent/data/elastic-agent-1d05ba/install/heartbeat-7.17.1-linux-x86_64 is filling up with
core.XXX files which I can only assume are core dumps.
Is there some way we can diagnose this, or something we might be doing wrong?
Thanks for the heads-up. This is a weird one that we've had some trouble reproducing, it's caused by chromium crashing. We do have a tracking issue here: [7.17] [Heartbeat] Crash core dump files generated for synthetic monitors · Issue #30426 · elastic/beats · GitHub
@vigneshshanmugam might have some thoughts on next steps, or possibly some debugging questions that may help out here.
I'm wondering if just updating to a newer playwright version for synthetics may help.
Reading over that issue, that's exactly what we're running into. Came to the same conclusion that it was a weird SIGABRT in Chrome, and ended up disabling core dumps in the container to resolve it temporarily.
Which method did you use to disable core dumps in the container? Last I checked our container was set to disable them, but I'm wondering if you're using a different setting, or if you had overriden that.
core in ulimits to
0 I believe? Not sure if that could be done genericly, but I'm also not that familiar period with the underlying bits.
That's very helpful, thanks for sharing @RichiCoder , was it set to something other than 0 prior? On my system it's set to
0 by default. As a heads-up we're looking into upgrading playwright (which brings in a newer version of chrome) in Upgrade playwright to 1.17.0 by andrewvc · Pull Request #483 · elastic/synthetics · GitHub
Additionally, I'm considering opening an issue around doing a better job enforcing the ulimit for core dumps, but it'd be great if we can repro this first.
Could you perhaps share your system info and how docker was setup? I'm trying to figure out if the default core limit of
0 was overriden anywhere.
Obviously, our first choice would be for chromium not to crash, but that is always a risk, and when it does it'd be best not to leave a core.
I'll be honest, I'm not sure what the default might be.
For context, we're running a relatively vanilla AWS Elastic Container Service cluster, with the container compute being provided by Fargate. So, I'm not super familiar with what the defaults are aside from the odd issue. You could probably reproduce with a bare bone cluster and the lowest tier task compute, and just run a synthetic at a fast interval.
Glad to see it's being upgraded! I'm a big playwright fan and reached for some recent APIs when writing the tests in questions and ran into sharp edges as a result (especially since I ran into another now solved(?) issue where errors weren't bubbling up in an easily visible way).
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.