OK, so we did a bit more due diligence and with the help of some
great sematext tools, we got some measurements of what's happening in the
background.
The setup:
- 2 x Ubuntu 12.04 LTS on m1.xlarge AWS (15GB RAM)
- java version "1.7.0_03"
OpenJDK Runtime Environment (IcedTea7 2.1.1pre)
(7~u3-2.1.1~pre1-1ubuntu3)
OpenJDK 64-Bit Server VM (build 22.0-b10, mixed mode) - Elasticsearch 0.19.8 installed using a slightly adjusted version of
this chef recipe: https://github.com/karmi/cookbook-elasticsearch - 2 nodes, one index, 5 shards, replication on, 35,624,944 child docs,
42,808 parent docs, 27.5gb index size - Startup settings:
ES_JAVA_OPTS="
-server
-Djava.net.preferIPv4Stack=true
-Des.config=/usr/local/etc/elasticsearch/elasticsearch.yml
-Dcom.sun.management.jmxremote
-javaagent:/spm/spm-monitor/lib/spm-monitor-es-1.6.0-withdeps.jar=/spm/spm-monitor/conf/spm-monitor-config-4606509b-b72c-44f4-be24-5d3c4ef37b5c-default.xml
-Xms9986m
-Xmx9986m
-Xss128k
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
-XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=/mnt/elasticsearch/logs/heapdump.hprof"
The test:
- Cycle through 300 random keywords doing a top_children query every 5
seconds for 60 times (about 25 hours) - Measure response times and mark anything taking over 10 seconds as a
"long" response time - While the test is running, some documents might be added at a random
rate; in this particular example, only 28 docs were indexed within the two
hours preceding the observations below
The results:
- After waiting for the first initial queries to warm up the caches,
subsequent queries consistently timed under one second on average - At some point during the test, query response times dropped
precipitously. The numbers are below and screen shots of the server health
stats are attached - It seems that GC is kicking in and pausing the system
- A curious observation is that both the filter caches and field caches
drop on their own even though we have not specified an expiration; they
should normally default to -1, i.e. no expiration, right?
Comments:
- Testing with the warming API will be next
- Still need to test with bootstrap.mlockall
- Wondering whether we need to try -XX:+UseG1GC since we are on JDK 7
Do people generally use the default GC settings here or do they tweak to
match their usage profiles? If the latter, what do people look at in terms
of GC to minimize pauses when it comes to supporting real-time search?
Would adding more nodes and shards help?
Looking forward to any feedback.
-GS
Test timings:
03-08-2012 10:07:28 ES Speed Checker INFO - Searching for: sparklingly
03-08-2012 10:07:28 ES Speed Checker INFO - Server time: 4,
03-08-2012 10:07:28 ES Speed Checker INFO - Total time: 22
03-08-2012 10:07:28 ES Speed Checker INFO -
03-08-2012 10:07:33 ES Speed Checker INFO - Searching for: swedish
03-08-2012 10:07:33 ES Speed Checker INFO - Server time: 12,
03-08-2012 10:07:33 ES Speed Checker INFO - Total time: 30
03-08-2012 10:07:33 ES Speed Checker INFO -
03-08-2012 10:07:38 ES Speed Checker INFO - Searching for: vainly
03-08-2012 10:07:38 ES Speed Checker INFO - Server time: 4,
03-08-2012 10:07:38 ES Speed Checker INFO - Total time: 21
03-08-2012 10:07:38 ES Speed Checker INFO -
03-08-2012 10:07:43 ES Speed Checker INFO - Searching for: useable
03-08-2012 10:07:43 ES Speed Checker INFO - Server time: 7,
03-08-2012 10:07:43 ES Speed Checker INFO - Total time: 24
03-08-2012 10:07:43 ES Speed Checker INFO -
03-08-2012 10:07:48 ES Speed Checker INFO - Searching for: queasiness
03-08-2012 10:07:48 ES Speed Checker INFO - Server time: 5,
03-08-2012 10:07:48 ES Speed Checker INFO - Total time: 21
03-08-2012 10:07:48 ES Speed Checker INFO -
03-08-2012 10:07:53 ES Speed Checker INFO - Searching for: handcuffing
03-08-2012 10:07:53 ES Speed Checker INFO - Server time: 5,
03-08-2012 10:07:53 ES Speed Checker INFO - Total time: 22
03-08-2012 10:07:53 ES Speed Checker INFO -
03-08-2012 10:07:58 ES Speed Checker INFO - Searching for: suddenness
03-08-2012 10:14:17 ES Speed Checker INFO - Server time: 378891,
03-08-2012 10:14:17 ES Speed Checker INFO - Total time: 378922 (long)
03-08-2012 10:14:17 ES Speed Checker INFO -
03-08-2012 10:14:22 ES Speed Checker INFO - Searching for: prison
03-08-2012 10:14:22 ES Speed Checker INFO - Server time: 14,
03-08-2012 10:14:22 ES Speed Checker INFO - Total time: 32
03-08-2012 10:14:22 ES Speed Checker INFO -
03-08-2012 10:14:27 ES Speed Checker INFO - Searching for: rationales
03-08-2012 10:20:53 ES Speed Checker INFO - Server time: 385608,
03-08-2012 10:20:53 ES Speed Checker INFO - Total time: 385670 (long)
03-08-2012 10:20:53 ES Speed Checker INFO -
03-08-2012 10:20:58 ES Speed Checker INFO - Searching for: outdoors
03-08-2012 10:20:58 ES Speed Checker INFO - Server time: 13,
03-08-2012 10:20:58 ES Speed Checker INFO - Total time: 82
03-08-2012 10:20:58 ES Speed Checker INFO -
03-08-2012 10:21:03 ES Speed Checker INFO - Searching for: supposedly
03-08-2012 10:21:03 ES Speed Checker INFO - Server time: 243,
03-08-2012 10:21:03 ES Speed Checker INFO - Server time: 8.406484,
03-08-2012 10:21:03 ES Speed Checker INFO - Total time: 260
[...]
03-08-2012 10:27:02 ES Speed Checker INFO - Searching for: platitudes
03-08-2012 10:27:02 ES Speed Checker INFO - Server time: 7,
03-08-2012 10:27:02 ES Speed Checker INFO - Total time: 24
03-08-2012 10:27:02 ES Speed Checker INFO -
03-08-2012 10:27:07 ES Speed Checker INFO - Searching for: necessarily
03-08-2012 10:27:07 ES Speed Checker INFO - Server time: 13,
03-08-2012 10:27:07 ES Speed Checker INFO - Total time: 32
03-08-2012 10:27:07 ES Speed Checker INFO -
03-08-2012 10:27:12 ES Speed Checker INFO - Searching for: rusts
03-08-2012 10:32:25 ES Speed Checker INFO - Server time: 312883,
03-08-2012 10:32:25 ES Speed Checker INFO - Server time: 0.7681017,
03-08-2012 10:32:25 ES Speed Checker INFO - Total time: 312918 (long)
03-08-2012 10:32:25 ES Speed Checker INFO -
03-08-2012 10:32:30 ES Speed Checker INFO - Searching for: collated
03-08-2012 10:37:06 ES Speed Checker INFO - Server time: 275658,
03-08-2012 10:37:06 ES Speed Checker INFO - Total time: 275537 (long)
03-08-2012 10:37:06 ES Speed Checker INFO -
03-08-2012 10:37:11 ES Speed Checker INFO - Searching for: burma
03-08-2012 10:43:40 ES Speed Checker INFO - Server time: 389449,
03-08-2012 10:43:40 ES Speed Checker INFO - Total time: 389475 (long)
03-08-2012 10:43:40 ES Speed Checker INFO -
03-08-2012 10:43:45 ES Speed Checker INFO - Searching for: overthrew
03-08-2012 10:43:45 ES Speed Checker INFO - Server time: 14,
03-08-2012 10:43:45 ES Speed Checker INFO - Total time: 35
03-08-2012 10:43:45 ES Speed Checker INFO -
03-08-2012 10:43:50 ES Speed Checker INFO - Searching for: joys
03-08-2012 10:43:50 ES Speed Checker INFO - Server time: 55,
03-08-2012 10:43:50 ES Speed Checker INFO - Total time: 73
03-08-2012 10:43:50 ES Speed Checker INFO -
03-08-2012 10:43:55 ES Speed Checker INFO - Searching for: keels
03-08-2012 10:43:56 ES Speed Checker INFO - Server time: 10,
03-08-2012 10:43:56 ES Speed Checker INFO - Total time: 29