Elasticsearch 5.5.0 cluster crash -- elasticsearch.yml processors set too high

The cluster has been stable since downgrading from Java8-131 to 112.

Cluster options

cluster.name: ES_CLUSTER1
node.name: "node1"
node.attr.rack: "rackA6-1"
node.master: true
node.data: true

Data options

path.data: /opt/elasticsearch
path.logs: /var/log/elasticsearch

Memory options

bootstrap.memory_lock: true

Network options

network.host: "10.10.10.1"
http.port: 9200

Discovery options

discovery.zen.ping.unicast.hosts: ["10.10.10.1", "10.10.10.2", "10.10.10.3"]
discovery.zen.minimum_master_nodes: 2

Gateway

gateway.recover_after_nodes: 3

Index cache settings

indices.memory.index_buffer_size: 5%

Other

processors: 320
action.auto_create_index: .security,.monitoring*,.watches,.triggered_watches,.watcher-history*,logstash-*
action.destructive_requires_name: true

X-Pack settings

xpack.ml.enabled: "false"
xpack.security.enabled: "false"

Good to hear.

Did you upgrade java at the same time as Elasticsearch, or before/after?

Caused by: java.lang.OutOfMemoryError: unable to create new native thread

processors: 320 

How many cores are on this machine? This is the root cause of the issue and I will be surprised if any version of Java is able to properly handle the number of native threads that you are implicitly trying to create. My guess is that you can simply unset that setting and things should go back to working.

If this box actually has 320 logical cores available to it, then you should use containers or VMs to carve up the box into multiple nodes rather than dedicate 320 cores to a single node. The JVM is simply unable to sustain that. An example of this type of failure can be seen here. Fortunately the JVM has improved since then, but not to 320 cores (the default max is now the number of available processors, but it used to be 24 and then 32; by setting the value explicitly you control its value).

2 Likes

At the same time. Lesson learned.

I've used this setting for a long time now with previous versions. Maybe your right in that it causes problems with Java8-131, maybe because it should and is now enforced in some way.

I guess I misread the Elasticsearch documentation in thinking more threads can boost performance on otherwise underutilized CPU resources, I believe it was recommended to try doubling the value or even more but I cannot find the reference now.

The boxes have 20 core / 40 thread count. From trial and error I reached that value, I think it related to issues with index and the bulk_queue pools. The default values would hardly reach 5k EPS and from trial and error I've been able to boost the performance up to 25k - 50k EPS.

Maybe it's time to re-test and take better notes. There are a myriad of variables to track when tuning for performance.

1 Like

I think you are correct.. two nodes just crashed with the same error and this is using Java8-112.

I changed the processors setting to 40 and the thread count dropped from ~1700 to ~500. Seems stable so far. Thanks for the help.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.