Somewhat recently we upgraded our cluster to 6.8, and in the process started running elasticsearch in Docker containers, one per host, using the elasticsearch:6.8.0 image from docker hub. Testing and deployment went fine, but now we're doing system updates and find that any time we stop and restart a container, it crashes on startup (while reading index state) with SIGILL:
elasticsearch_1 | [2019-09-20T14:56:03,000][INFO ][o.e.x.m.p.l.CppLogMessageHandler] [esworker09] [controller/113] [Main.cc@109] controller (64 bit): Version 6.8.0 (Build e6cf25e2acc5ec) Copyright (c) 2019 Elasticsearch BV elasticsearch_1 | # elasticsearch_1 | # A fatal error has been detected by the Java Runtime Environment: elasticsearch_1 | # elasticsearch_1 | # SIGILL (0x4) at pc=0x00007fde7c97c8c8, pid=1, tid=88 elasticsearch_1 | # elasticsearch_1 | # JRE version: OpenJDK Runtime Environment (12.0.1+12) (build 12.0.1+12) elasticsearch_1 | # Java VM: OpenJDK 64-Bit Server VM (12.0.1+12, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64) elasticsearch_1 | # Problematic frame: elasticsearch_1 | # J 4984 c2 com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer.findName([II)Ljava/lang/String; (198 bytes) @ 0x00007fde7c97c8c8 [0x00007fde7c97c7a0+0x0000000000000128] elasticsearch_1 | #
The only consistent solution we've found so far is clearing the data directories, which are bind-mounted into the container from the host. It doesn't matter whether we restart the old container or create a new one. Further complicating matters, one node was left in a crash loop for about 12 hours and then it started working with no intervention. Upgrading to 6.8.3 with OpenJDK 12.0.2 did not help.
Any suggestions on how to further troubleshoot this?