Bad performance and crashes with Elasticsearch 5.1

Vitaly_il · April 19, 2017, 11:37am

We're running ELK 5.1.1; Elasticsearch runs on a standalone EC2 instance with 4CPUs/16GB RAM.
We're indexing about 20GB daily, 40 indexes per day, so with 30-40 days retention we have ~800GB data, about 1800 indexes.
This is staging environment, in our production we're running Elastic 2.x, and server with the same specs works nice with >300GB/day, i.e. about 15 times more.
As far as I can see, our traffic is very low for this server; our baseline is about 3% iowait and about 15% user CPU load.
We have two issues:

all searches are slow. For instance, just basic discovery for the last week takes about 40 seconds. During that I see user CPU usage close to 100%, iowait stays low - up to 5%. Many queries are aborted by circuit breakers, in this case Elasticsearch stops indexing.
from time to time it stops indexing

"jps -l -m -v" output:
16460 sun.tools.jps.Jps -l -m -v -Dapplication.home=/usr/lib/jvm/jdk1.8.0_101 -Xms8m
1916 org.elasticsearch.bootstrap.Elasticsearch -d -p /var/run/elasticsearch/elasticsearch.pid -Edefault.path.logs=/var/log/elasticsearch -Edefault.path.data=/elasticsearch/data -Edefault.path.conf=/etc/elasticsearch -Xms8g -Xmx8g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+DisableExplicitGC -XX:+AlwaysPreTouch -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -XX:+HeapDumpOnOutOfMemoryError -Djna.tmpdir=/elasticsearch/tmp -Des.path.home=/usr/share/elasticsearch

Any ideas?

TIA, Vitaly

jasontedor · April 19, 2017, 12:20pm

All versions less than 5.2.2 are impacted by three issues:

the Netty recycler (Netty is the framework we use for networking) is fundamentally flawed, so we disabled it in 5.1.2 (https://github.com/elastic/elasticsearch/pull/22452)
collecting stats while indexing could cause nodes to drop (https://github.com/elastic/elasticsearch/pull/22317)
a client that disconnects during an HTTP request could cause a circuit breaker leak (https://github.com/elastic/elasticsearch/pull/23310/files)

From your description, I think you are most likely heavily impacted by the first issue.

I think you should upgrade.

Vitaly_il · April 26, 2017, 6:29am

Jason, many thanks!
We upgraded to 5.3 last Thu, and so far system seems much better - no crashes during a week.
Vitaly

jasontedor · April 26, 2017, 9:40am

You're welcome!

system · May 24, 2017, 9:49am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch slow, crash Elasticsearch	3	355	May 22, 2019
Elasticsearch slow performance Elasticsearch	8	2875	July 5, 2017
Elastic + Kibana Server Specs Recommendation Elasticsearch	3	4283	July 6, 2017
Performance Issues and timeouts with Elasticsearch Elasticsearch	5	5944	January 11, 2017
Performance problems Elasticsearch	12	586	July 6, 2017

Bad performance and crashes with Elasticsearch 5.1

Related topics