Hi All,
We've seen a jump in CPU usage after upgrading the cluster from 0.19.2 to
0.19.10
Background info:
- 3 node cluster
- 5 shards, 1 replica
- ~33 million documents, ~32gb in size
- ~110k/80k new/deleted documents/day
- ~300-500k searches/day
Each node is allocated 4gb of jvm heap.
We upgraded the cluster Sunday night. Full shutdown and restart of all
nodes.
Our busy days are weekdays and for high level comparison purposes, traffic
is the same. Comparing our metrics archive of last Friday(0.19.2) to
Monday(0.19.10) we see the following:
Ratio Threshold: > 2.00 or < 0.500
20130405.0.bz2 20130408.0 Ratio Metric-Instance
- 7927 293607 37.04
elasticsearch.nodes.indices.merges.current_size_in_bytes ["es3"]* -
0.453 11.99 26.48
elasticsearch.nodes.indices.merges.current_docs ["es3"]*
0.001 0.019 19.00 disk.dev.read ["xvda"]
0.019 0.233 12.26 disk.dev.read_bytes ["xvda"]
-
0.178 1.004 5.64
elasticsearch.nodes.process.cpu.user_in_millis ["es2"]*
-
20.79 106.9 5.14 elasticsearch.nodes.process.cpu.percent
["es2"]*
-
0.001 0.005 5.00
elasticsearch.nodes.indices.indexing.index_time_in_millis ["es2"]*
0.013 0.062 4.77 network.tcp.retranssegs
-
0.530 1.646 3.11 kernel.all.load ["15 minute"]*
-
0.001 0.003 3.00
elasticsearch.nodes.indices.indexing.index_time_in_millis ["es1"]*
-
0.001 0.003 3.00
elasticsearch.nodes.indices.indexing.index_time_in_millis ["es3"]*
-
0.559 1.669 2.99 kernel.all.load ["5 minute"]*
-
0.081 0.242 2.99 kernel.percpu.cpu.user ["cpu1"]*
-
0.166 0.495 2.98
elasticsearch.nodes.process.cpu.user_in_millis ["es3"]*
-
0.567 1.678 2.96 kernel.all.load ["1 minute"]*
-
18.22 53.04 2.91 elasticsearch.nodes.process.cpu.percent
["es3"]*
-
0.177 0.507 2.86 kernel.all.cpu.user*
-
0.096 0.265 2.76 kernel.percpu.cpu.user ["cpu0"]* 0.466 1.229 2.64 mem.vmstat.pgdeactivate 0.004 0.010 2.50
elasticsearch.nodes.jvm.gc.collectors.ParNew.collection_time_in_millis
["es2"]
0.004 0.010 2.50
elasticsearch.nodes.jvm.gc.collection_time_in_millis ["es2"]
0.002 0.005 2.50 mem.vmstat.pgmajfault
0.026 0.058 2.23
elasticsearch.nodes.jvm.gc.collectors.ParNew.collection_time_in_millis
["es3"]
0.026 0.058 2.23
elasticsearch.nodes.jvm.gc.collection_time_in_millis ["es3"]
0.727 1.533 2.11
elasticsearch.nodes.jvm.gc.collectors.ParNew.collection_count ["es3"]
0.727 1.533 2.11
elasticsearch.nodes.jvm.gc.collection_count ["es3"]
1st column is Friday, 2nd column is Monday, 3rd is the ratio difference and
the 4th is metric.
So, we are seeing a 3-5 increase in indexing time, and a corresponding
increase in CPU time. This is coupled with a huge jump in merges (the top
two lines).
We are wondering if there is any change in segment merge policy between the
versions? (0.19.2 and 0.19.10) ? I see that 0.19.3 included a lucene
version change, but I can't find much other info about the change.
Any ideas about possible causes/solutions?
Cheers,
Owen Butler
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.