I'm only now getting round to migrating from 0.16.2 to 0.17.9 (I hate
changing stuff that works!!)
I have a single test machine that has 8GB and some other stuff
running, so I restrict ES to 1GB of memory, this used to work fine
with all the queries I'd ever make.
Since upgrading to 0.17.9 however, generating facets uses up all the
memory and causes the elasticsearch process to hang
An example query is in the following gist: https://gist.github.com/1370278
(Note that the query is unnecessarily complex simply because it's auto-
generated by code that expects much more complex logic than this test)
When executed without the facets, the correct documents are returned.
When executed with the facets the memory jumps to the maximum and the
CPU thrashes "forever" (and the query "never" returns):
#bash> top
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6709 elastics 25 0 1360m 1.1g 10m S 49.3 14.6 0:59.82
java
(If I run the same facet over a subset of the data, the memory jumps
to about 900M and the system continues to work)
The log doesn't generate out-of-memory errors on the query, though it
sometimes gets upsets when I try to close the query connection, eg:
2011-11-16 09:43:24.293 [WARN] transport.netty:91 - [Ms. Steed]
Exception caught on netty layer [[id: 0x744145b1, /REMOTEIP:61282 => /
LOCALIP:9300]]
2011-11-16 09:43:28.523 [WARN] netty.channel.DefaultChannelPipeline:91
- An exception was thrown by a user handler while handling an
exception event ([id: 0x744145b1, /REMOTEIP:61282 => /LOCALIP:9300]
EXCEPTION: java.io.StreamCorruptedException: invalid data length: 0)
java.lang.OutOfMemoryError: Java heap space
2011-11-16 09:43:51.276 [WARN] transport.netty:91 - [Ms. Steed]
Exception caught on netty layer [[id: 0x673a95af, /LOCALIP:55230 => /
LOCALIP:9300]]
java.lang.OutOfMemoryError: Java heap space
So I can see a few possibilities:
1] Lucene 3.4 is just more memory hungry and I'll have to live with
that
2] Some setting in my facet doesn't make sense and I was just lucky
with Lucene 3.1
3] Some change to elasticsearch makes it more memory hungry, and maybe
I have to live with that / maybe it's a bug
4] I'm missing some configuration setting which will make it handle
the memory more gracefully
Any thoughts much appreciated!
FWIW, I have been running my "deployment" nodes with "-Xms6656m -
Xmx6656m -Xmn2048m" (after about 4 months of continuous use -
including far more and more complex facets and queries, the CPU
started to get busy doing GC, though it still responded fairly quickly
ie wasn't quite the same issue described above ... I was going to
write a cron job that cycled through the nodes restarting one at a
time over a period of a month or so - any thoughts on that strategy?
(Oh and apologies if something like this has already been discussed in
a similar thread - a quick search didn't throw anything up -, I
haven't been hanging out here as much as I used to because I haven't
needed to do anything to elasticsearch, functional or otherwise, it
just worked perfectly!)