How seriously Elastic takes the topic of product safety / software quality

Hi there,
Two weeks ago i opened a ticket for a memory leak at Metricbeat.

The memory leak was highly risky for productive environments.
It took several days until the problem was recognized, understood and fixed.
There "were" and "still are no warnings" on the download page of metric beat or fleetmanager.
So that many run into the same problem and endanger their systems acutely.
The reference in the realease note is of little help here.

When asked what Craig as "tech lead" thinks about this, the following answer was given.

I don't have anything to comment besides apologizing that it took so long to be communicated widely. It is now a known issue in the release notes https://www.elastic.co/guide/en/fleet/current/release-notes-8.11.1.html. The initial focus was on identifying and fixing the issue and that delayed the communication unnecessarily.

The 8.11.x downloads are still available for now. This is a severe issue if you rely on Metricbeat or Elastic Agent for process metrics on Windows, but there are many other important uses that aren't affected.

This was clearly the wrong response to potential Enterprise customers and will teach us a lesson.
A very bad way of communicating with potential customers.

Best regards
StefanSa

Hi Stefan,

Thanks for creating this post and once again sorry for the inconvenience it causes on your end.
Even if we did our best I must admit that we should have communicated more clearly about this.
For you information here are the actions taken on our:

  • 8.11.0 and 8.11.1 were updated with a warning message as soon as we found a fix
  • 8.11.2 contains the fix and should be released in the next days
  • While working on the detailed test implementation for this we will also manually double check the memory consumption of our process
  • We are working to add automated tests that would have caught this problem in addition to manual testing.
  • We are also looking at adding warning on the download pages in order to inform users when our binaries might cause problems
  • We are discussing how Elastic Agent downgrade should work
  • We working on a way to ship Elastic Agent binaries out of the current release cycle to help in such event
  • We are working to add automated tests that would have caught this problem in addition to manual testing.
    If you would like to know more about these, happy to go through them in a call.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.