We're running ELK 5.1.1; Elasticsearch runs on a standalone EC2 instance with 4CPUs/16GB RAM.
We're indexing about 20GB daily, 40 indexes per day, so with 30-40 days retention we have ~800GB data, about 1800 indexes.
This is staging environment, in our production we're running Elastic 2.x, and server with the same specs works nice with >300GB/day, i.e. about 15 times more.
As far as I can see, our traffic is very low for this server; our baseline is about 3% iowait and about 15% user CPU load.
We have two issues:
- all searches are slow. For instance, just basic discovery for the last week takes about 40 seconds. During that I see user CPU usage close to 100%, iowait stays low - up to 5%. Many queries are aborted by circuit breakers, in this case Elasticsearch stops indexing.
- from time to time it stops indexing
"jps -l -m -v" output:
16460 sun.tools.jps.Jps -l -m -v -Dapplication.home=/usr/lib/jvm/jdk1.8.0_101 -Xms8m
1916 org.elasticsearch.bootstrap.Elasticsearch -d -p /var/run/elasticsearch/elasticsearch.pid -Edefault.path.logs=/var/log/elasticsearch -Edefault.path.data=/elasticsearch/data -Edefault.path.conf=/etc/elasticsearch -Xms8g -Xmx8g -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+DisableExplicitGC -XX:+AlwaysPreTouch -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -XX:+HeapDumpOnOutOfMemoryError -Djna.tmpdir=/elasticsearch/tmp -Des.path.home=/usr/share/elasticsearch