Hi,
We're running a two-nodes ES 1.0.3 cluster with the current setup :
VM on host A :
4 vCore CPU
32GB RAM
ES master (only node being queried)
MySQL slave (used as a backup, never queried)
JVM settings
/usr/lib/jvm/java-7-openjdk-amd64//bin/java -Xms2g -Xmx2g -Xss256k
-Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
-XX:+HeapDumpOnOutOfMemoryError -Delasticsearch
-Des.pidfile=/var/run/elasticsearch.pid
-Des.path.home=/usr/share/elasticsearch -cp
:/usr/share/elasticsearch/lib/elasticsearch-1.0.3.jar:/usr/share/elasticsearch/lib/:/usr/share/elasticsearch/lib/sigar/
-Des.default.config=/etc/elasticsearch/elasticsearch.yml
-Des.default.path.home=/usr/share/elasticsearch
-Des.default.path.logs=/home/log/elasticsearch
-Des.default.path.data=/home/elasticsearch
-Des.default.path.work=/tmp/elasticsearch
-Des.default.path.conf=/etc/elasticsearch
org.elasticsearch.bootstrap.Elasticsearch
VM on host B :
2 vCore CPU
16GB RAM
ES datanode (search are dispatched, no indexing)
MySQL master
JVM settings
/usr/lib/jvm/java-7-openjdk-amd64//bin/java -Xms2g -Xmx2g -Xss256k
-Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
-XX:+HeapDumpOnOutOfMemoryError -Delasticsearch
-Des.pidfile=/var/run/elasticsearch.pid
-Des.path.home=/usr/share/elasticsearch -cp
:/usr/share/elasticsearch/lib/elasticsearch-1.0.3.jar:/usr/share/elasticsearch/lib/:/usr/share/elasticsearch/lib/sigar/
-Des.default.config=/etc/elasticsearch/elasticsearch.yml
-Des.default.path.home=/usr/share/elasticsearch
-Des.default.path.logs=/home/log/elasticsearch
-Des.default.path.data=/home/elasticsearch
-Des.default.path.work=/tmp/elasticsearch
-Des.default.path.conf=/etc/elasticsearch
org.elasticsearch.bootstrap.Elasticsearch
Before we got a 4 vCore CPU / 32GB VM, the master node was the same as the
secondary node.
On this cluster, we have a 5 shards (+5 replica) index - we'll call it main
- with ~130k documents at the moment for a 120MB size which, we update
documents that were updated by our customers in our application with a cron
than run every 5 minutes and updates at most 2k docs, we can have a few
thousands docs in queue.
We are also using logstash to log some user actions that our application
relies on in monthly basis indices. Those indices have 2 shards (+2
replica) with 1~6M docs for a size from 380MB to 1.5GB. At the moment, we
have 11 log indices.
We do search queries on both the main and the latest log indices.
Marginally, some queries can occure on older log indices.
Looking at our stats, I'd say we have a 2:1 indexing / searching ratio, but
it can vary depending on seasonality.
We also have a 1 shard (+1 replica) dedicated percolator index on which
we're executing percolation queries before each log that will be indexed in
ES through logstash.
We never optimized any index.
Our issue :
Since we updated ES to v1.0.3 to deal with a field data breaker bug,
everything was running fine until we experienced a drastic CPU usage
increase(from near 100% to 200%) without any reason (no change in our
application nor on the traffic we got). No ES restart have been able to
give us back a normal CPU usage. In emergency, we decided to switch our
main node from 2 vCore CPU/ 16GB to 4 vCore CPU / 32 GB and the CPU usage
of the new node never went beyond 30% for almost 10 days. And then the
issue happened again, the CPU usage raised to 400% without any reason.
It is worth noting that the secondary node is not subject to this issue.
Our outsourcer told us this CPU increase was due to deadlocks caused by
malformed queries, but those malformed queries already happened before and
restarting ES didn't solve the high CPU usage.
He also told us our server had not enough resources and it would be better
to have 2 serveurs for MySQL master / slave and 2 to 3 distinct serveurs
for the ES cluster, which seems weird when we saw the main ES server had a
maximum 30% CPU usage for days.
We plan to update ES version to see if this issue is a bug that was already
solved, but are there other things we could try ? I wondered if our jvm
heap was enough since we have many data, that we use many filters in our
search queries and that we have more than 8GB unused memory on our main
node. Does the fact that our secondary node is not subject to this issue
says it's an indexing issue ?
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/59e995ba-cf68-4fb6-848a-d7e4f0c8bb99%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.