Elasticsearch is failing to start on Docker since 7.13

Hi,

I'm seeing problems to start Elasticsearch in Docker containers since 7.13. I can't find anything related in the release notes or breaking changes.

Background: I'm using custom roles to install and handle Elasticsearch search and they usually work like a charm. My problem ist, I'm using Molecule to test the roles and these tests break since 7.13 is available (same with 7.14).

For those unfamiliar with Molecule: It's a simple and powerful way to test Ansible roles. Molecule fires up a Container and uses the role to configure the container like it was a real host (in this case an Elasticsearch node). It has some peculiarities, especially when it comes to testing systemd services but I'm quite confident I found a way around these.

Why I'm asking is, that Elasticseach is failing with Error: Could not find or load main class [0.001s][warning][os,container]. When I seach for that error I find threads about running Elasticsearch on Windows or from custom installations from tarballs with broken $JAVA_HOME from times where Elasticsearch didn't bring it's own JRE.

Here's the full output:

  TASK [ansible-role-elasticsearch : Show logs] **********************************
  ok: [elasticsearch_default] => {
      "es_output.stdout_lines": [
          "● elasticsearch.service - Elasticsearch",
          "   Loaded: loaded (/usr/lib/systemd/system/elasticsearch.service; disabled; vendor preset: disabled)",
          "   Active: failed (Result: exit-code) since Wed 2021-08-25 17:17:04 UTC; 1s ago",
          "     Docs: https://www.elastic.co",
          "  Process: 2065 ExecStart=/usr/share/elasticsearch/bin/systemd-entrypoint -p ${PID_DIR}/elasticsearch.pid --quiet (code=exited, status=1/FAILURE)",
          " Main PID: 2065 (code=exited, status=1/FAILURE)",
          "",
  Warning: Aug 25 17:17:02 elasticsearch_default systemd-entrypoint[2065]: [0.001s][warning][os,container] Duplicate cpuset controllers detected. Picking /sys/fs/cgroup/cpuset, skipping /sys/fs/cgroup/cpuset.",
  Warning: Aug 25 17:17:02 elasticsearch_default systemd-entrypoint[2065]: [0.001s][warning][os,container] Duplicate cpuset controllers detected. Picking /sys/fs/cgroup/cpuset, skipping /sys/fs/cgroup/cpuset.",
  Warning: Aug 25 17:17:03 elasticsearch_default systemd-entrypoint[2065]: [0.001s][warning][os,container] Duplicate cpuset controllers detected. Picking /sys/fs/cgroup/cpuset, skipping /sys/fs/cgroup/cpuset.",
  Warning: Aug 25 17:17:03 elasticsearch_default systemd-entrypoint[2065]: [0.001s][warning][os,container] Duplicate cpuset controllers detected. Picking /sys/fs/cgroup/cpuset, skipping /sys/fs/cgroup/cpuset.",
  Warning: Aug 25 17:17:04 elasticsearch_default systemd-entrypoint[2065]: Error: Could not find or load main class [0.001s][warning][os,container]",
  Warning: Aug 25 17:17:04 elasticsearch_default systemd-entrypoint[2065]: Caused by: java.lang.ClassNotFoundException: [0/001s][warning][os,container]",
          "Aug 25 17:17:04 elasticsearch_default systemd[1]: elasticsearch.service: main process exited, code=exited, status=1/FAILURE",
          "Aug 25 17:17:04 elasticsearch_default systemd[1]: Failed to start Elasticsearch.",
          "Aug 25 17:17:04 elasticsearch_default systemd[1]: Unit elasticsearch.service entered failed state.",
          "Aug 25 17:17:04 elasticsearch_default systemd[1]: elasticsearch.service failed."
      ]
  }

If you want to have a look at the details, the repository where I'm trying to work around that is on GitHub . Please don't mind all the weird output tasks I added but my problem is that the role works on VMs or "real" hosts but fails only within this setup.

Cheers,
Thomas

Can you grab the actual logs from the Elasticsearch process from the container?

Not easily. I'll try. The problem is, Molecule is starting the container and the service. And when it fails, it destroys the container right after. With this setup, the container runs on a runner at GitHub, so no shell access.

I know, I could run it locally but I had some issues with that. I'll try and come back.

I changed the tasks and I can tell you the following new information:

  • elasticsearch.log is empty
  • The whole role works when I switch from elasticsearch to elasticsearach-oss (I'm running tests for both versions and with the same code, just changing to the OSS variant makes the role work

(Of course I'm not just changing the package name but the repository, too :wink: )

Sorry. The OSS version working was just me not realizing that, if you install the latest elasticsearch-oss package you get 7.10. So it's still 7.13 not working, earlier versions working.