Elasticsearch 6.8 in Docker crashing with SIGILL after restart - corrupt state?

retzkek · September 20, 2019, 3:31pm

Somewhat recently we upgraded our cluster to 6.8, and in the process started running elasticsearch in Docker containers, one per host, using the elasticsearch:6.8.0 image from docker hub. Testing and deployment went fine, but now we're doing system updates and find that any time we stop and restart a container, it crashes on startup (while reading index state) with SIGILL:

elasticsearch_1  | [2019-09-20T14:56:03,000][INFO ][o.e.x.m.p.l.CppLogMessageHandler]     [esworker09] [controller/113] [Main.cc@109] controller (64 bit): Version 6.8.0 (Build e6cf25e2acc5ec) Copyright (c) 2019 Elasticsearch BV
elasticsearch_1  | #
elasticsearch_1  | # A fatal error has been detected by the Java Runtime Environment:
elasticsearch_1  | #
elasticsearch_1  | #  SIGILL (0x4) at pc=0x00007fde7c97c8c8, pid=1, tid=88
elasticsearch_1  | #
elasticsearch_1  | # JRE version: OpenJDK Runtime Environment (12.0.1+12) (build 12.0.1+12)
elasticsearch_1  | # Java VM: OpenJDK 64-Bit Server VM (12.0.1+12, mixed mode, sharing, tiered, compressed oops, g1 gc, linux-amd64)
elasticsearch_1  | # Problematic frame:
elasticsearch_1  | # J 4984 c2 com.fasterxml.jackson.core.sym.ByteQuadsCanonicalizer.findName([II)Ljava/lang/String; (198 bytes) @ 0x00007fde7c97c8c8 [0x00007fde7c97c7a0+0x0000000000000128]
elasticsearch_1  | #

The only consistent solution we've found so far is clearing the data directories, which are bind-mounted into the container from the host. It doesn't matter whether we restart the old container or create a new one. Further complicating matters, one node was left in a crash loop for about 12 hours and then it started working with no intervention. Upgrading to 6.8.3 with OpenJDK 12.0.2 did not help.

Any suggestions on how to further troubleshoot this?

retzkek · October 2, 2019, 3:22pm

Follow-up in case anyone else runs across this: after the help of some patient engineers at Elastic{ON} yesterday we determined that the JVM was sometimes misidentifying the available instruction sets (specifically SSSE3) on our older Opterons. Manually limiting to SSE2 (-XX:UseSSE=2) seems to have solved it.

system · October 30, 2019, 3:22pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Docker Elasticsearch 8.10.3 Java Crash Elasticsearch docker , runtime-fields	2	353	November 21, 2023
6.5.0 JVM Crash Elasticsearch	10	1520	February 18, 2019
Elasticsearch crash because of JVM fatal error Elasticsearch	3	1125	July 8, 2019
Elasticsearch 6.6.0 docker container Error: failed to read [id:143, file:/usr/share/elasticsearch/data/nodes/0/_state/global-143.st] Elasticsearch docker	7	3945	August 7, 2019
JVM crash for Logstash and Elasticsearch 7.1.1 Elasticsearch docker	3	839	July 16, 2019

Elasticsearch 6.8 in Docker crashing with SIGILL after restart - corrupt state?

Related topics