What is the ElasticMachine GitHub Bot?

Hey folks,

I've been researching tools for monitoring flaky tests and noticed some pretty neat activity with elasticmachine.

By the looks of it this bot does a ton of operational work for elastic.co, and is closed source.

I'm curious though, if you can tell us about it's test monitoring features? (I think I saw it add a test report to a PR somewhere but cannot find it now).

I'm planning an open source project to monitor tests across large repos, so If anyone who has worked on this is available to chat I'd love your input.

Thanks!

Hey @alex-treebeard . Thanks for the question for interest in our work!

Most of the tooling that we use for the bots is available in our pipeline library for Jenkins, which is open-source and can be seen here:

GitHub - elastic/apm-pipeline-library: Jenkins pipeline shared library for the APM project.

The flaky test analyzer is not currently a part of that library but what it does is more or less what you might imagine. It looks through branch test results and applies heuristics to try to find tests which fail intermittently. We've also been experimenting more recently with more advanced forms of analysis which attempts to find and categorize build failures by type (test-related, CI-related, infrastructure-related etc) by querying a mix of the Jenkins API and the test results stored in Elasticsearch. If I get a bit more time, I'll try to put together a blog post or something on how that all works.

If you have specific questions in the meantime, I'm happy to answer them. Cheers and good luck with your project! I think there's a lot of interest in something like what you're building and I'll be very interested to see what you come up with.

3 Likes

Thanks so much for the information Mike!

I do actually have two follow-up questions if that's okay:

  1. Do you have a separate web dashboard where people get information about the test flakiness as well as via pull requests? I am wondering what medium is the most effective for surfacing richer test output -- perhaps there are already too many places to get operational information from.
  2. What would be the most painful symptom your teams would experience if you didn't have this test monitoring tooling that you have built?

Definitely interested in any blog post you would write here