Elasticsearch + logstach : High recurrent load

Hello Everybody,

I have deployed logstash + elasticsearch (plus redis/kibana) for my central
logging solution. Everything runs fine except that I have system load
bursts every 10 hours (more or less). I have attached the system load graph
to the post.

I first had load bursts up to 60 then I updated elasticsearch from version
0.20 to 0.90.3 and now the load bursts are around 15.

A part from the load bursts every 10 hours, the load is fine (between 0.2
and 1).

To be honest, I'm not a Java expert so I'll try to give as much information
as I can :

  • Elasticsearch runs on a single server

  • The server is a Dell R710 with 32GB RAM (with 16GB dedicated to
    Elasticsearch => ES_HEAP_SIZE=16g) with dedicated RAID disks subsystem.

  • OS is CentOS 64 bits up to date (Kernel 2.6.32-358.14.1.el6.x86_64)

  • Java version installed from RPM "jdk-7u21-linux-x64.rpm"
    java version "1.7.0_21"
    Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
    Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)

  • Elasticsearch/logstach are the only processes requiring cpu power on this
    server

  • For now elasticsearch only indexes 3 jboss servers and 1 apache (one
    index per type of log). Load also appears during night when there is very
    low traffic (hence few logs to index and no kibana query)

  • When load appears, "top" show that hundreds of CPU% is spent by
    elasticsearch

  • There is low %iowait and low disk utilization during the load bursts

  • There is nothing interesting in the logs

  • limits.conf has been modified

elasticsearch soft nofile 64000
elasticsearch hard nofile 64000
elasticsearch soft memlock unlimited
elasticsearch hard memlock unlimited

  • Elasticsearch indexes have not been customized

  • Elasticsearch configuration has been modified following some blogs
    advices but with no sucess (see elasticsearch.yml attached)

  • Elasticsearch process looks like this :

496 158397 50.2 53.7 26140636 17664680 ? Sl Aug29 4487:29
/usr/java/default/bin/java -Xms16g -Xmx16g -Xss256k
-Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
-XX:+HeapDumpOnOutOfMemoryError -Delasticsearch
-Des.pidfile=/var/run/elasticsearch/elasticsearch.pid
-Des.path.home=/usr/share/elasticsearch -cp
:/usr/share/elasticsearch/lib/elasticsearch-0.90.3.jar:/usr/share/elasticsearch/lib/:/usr/share/elasticsearch/lib/sigar/
-Des.default.path.home=/usr/share/elasticsearch
-Des.default.path.logs=/var/log/elasticsearch -Des.default.path.data=/data
-Des.default.path.work=/tmp/elasticsearch
-Des.default.path.conf=/etc/elasticsearch
org.elasticsearch.bootstrap.ElasticSearch

  • I have attached the output of curl localhost:9200/_nodes/hot_threads to
    this post

It seems like there is some cache or indexes refreshing that makes
elasticsearch going crazy every 10 hours. Any idea ?

Please tell me if you need more info.

Thanks a lot in advance
Greg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Here is a helpful presentation:

Uwe Schindler's talk at Berlin Buzzwords called "Testing Lucene and Solr
with various JVMs: bugs, bugs, bugs" where he explains some bugs you may
encounter depending on the JVM you are using and gives a final
recommendation about JVMs that may be used with Lucene. The video is about
44 minutes:

[video] http://www.youtube.com/watch?v=PVRdLyQGUxE

*[slides]
http://berlinbuzzwords.de/sites/berlinbuzzwords.de/files/slides/Schindler-BugsBugsBugs.pdf
*
The bottom line: There seem to be two best options:

  1. Oracle Java 6 at some older version.
  2. OpenJDK 7. (and not 6. Note that compariing version numbers between
    Oracle Java and OpenJDK is meaningless).

Hope this helps.

On Wednesday, September 4, 2013 10:37:33 AM UTC-4, Greg Bui wrote:

  • Java version installed from RPM "jdk-7u21-linux-x64.rpm"
    java version "1.7.0_21"
    Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
    Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Thanks for the hint !

I'll try with Oracle 1.6 u45 and send my results back.

Cheers !
Greg

On Wednesday, September 4, 2013 5:09:32 PM UTC+2, InquiringMind wrote:

Here is a helpful presentation:

Uwe Schindler's talk at Berlin Buzzwords called "Testing Lucene and Solr
with various JVMs: bugs, bugs, bugs" where he explains some bugs you may
encounter depending on the JVM you are using and gives a final
recommendation about JVMs that may be used with Lucene. The video is about
44 minutes:

[video] http://www.youtube.com/watch?v=PVRdLyQGUxE

*[slides]
http://berlinbuzzwords.de/sites/berlinbuzzwords.de/files/slides/Schindler-BugsBugsBugs.pdf
*
The bottom line: There seem to be two best options:

  1. Oracle Java 6 at some older version.
  2. OpenJDK 7. (and not 6. Note that compariing version numbers between
    Oracle Java and OpenJDK is meaningless).

Hope this helps.

On Wednesday, September 4, 2013 10:37:33 AM UTC-4, Greg Bui wrote:

  • Java version installed from RPM "jdk-7u21-linux-x64.rpm"
    java version "1.7.0_21"
    Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
    Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hello,

I've switched to JVM 1.6.0_37, but it doesn't seem to solve the problem.

Do you have any other idea that can explain such load bursts ?

Thanks again
Greg

On Wednesday, September 4, 2013 5:42:18 PM UTC+2, Greg Bui wrote:

Thanks for the hint !

I'll try with Oracle 1.6 u45 and send my results back.

Cheers !
Greg

On Wednesday, September 4, 2013 5:09:32 PM UTC+2, InquiringMind wrote:

Here is a helpful presentation:

Uwe Schindler's talk at Berlin Buzzwords called "Testing Lucene and
Solr with various JVMs: bugs, bugs, bugs" where he explains some bugs you
may encounter depending on the JVM you are using and gives a final
recommendation about JVMs that may be used with Lucene. The video is about
44 minutes:

[video] http://www.youtube.com/watch?v=PVRdLyQGUxE

*[slides]
http://berlinbuzzwords.de/sites/berlinbuzzwords.de/files/slides/Schindler-BugsBugsBugs.pdf
*
The bottom line: There seem to be two best options:

  1. Oracle Java 6 at some older version.
  2. OpenJDK 7. (and not 6. Note that compariing version numbers between
    Oracle Java and OpenJDK is meaningless).

Hope this helps.

On Wednesday, September 4, 2013 10:37:33 AM UTC-4, Greg Bui wrote:

  • Java version installed from RPM "jdk-7u21-linux-x64.rpm"
    java version "1.7.0_21"
    Java(TM) SE Runtime Environment (build 1.7.0_21-b11)
    Java HotSpot(TM) 64-Bit Server VM (build 23.21-b01, mixed mode)

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.