Hi, I'm having a problem when performing bulk inserts into elasticsearch.
The short story is, I'm inserting batches of about 100,000 documents (70mb)
about once every 1.5 minutes and after about 10million documents have been
inserted I start getting OutOfMemory exceptions being thrown / es becomes
unresponsive until I restart it etc.
Here's the longer version:
Background:
I'm running: ElasticSearch Version: 0.19.11, on Debian Squeeze with OpenJDK
- here's the output from java -version:
java version "1.6.0_18"
OpenJDK Runtime Environment (IcedTea6 1.8.13) (6b18-1.8.13-0+squeeze2)
OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode)
My elasticsearch.yml is here: https://gist.github.com/4318237
And my elasticsearch startup script is here:
https://gist.github.com/4318572
Problem Description:
I have some C# code which denormalizes some SQL data and inserts into my
elasticsearch instance using the Bulk API (via the NEST .net elasticsearch
client). My batches are around 100,000 documents per request which is
about 70mb of data. I insert around once every 1.5 minutes.
Everything seems fine to start with. My heap memory seems to go up and
down in a pattern I would expect (sorry no screenshot of that). Until I
get to around 5 million documents inserted.
At this point, I start to get a lot of gc ConcurrentMarkSweep warnings in
the log. Here's a capture from my log when these warnings start to
appear: https://gist.github.com/4318175. The API is now lagging, takes
around 5 seconds to get a response. Also, here's the output from the
hot_threads api: https://gist.github.com/4318193.
From then on, the time a ConcurrentMarkSweep gc takes increases constantly
along with the size of the heap. Here's another capture from the log
showing the heap size increase and also the gc duration time has increased:
https://gist.github.com/4318188. And here is the output from hot_threads
again: https://gist.github.com/4318197. Now the API is really unresponsive
taking around 12s to respond (I guess this will always be relative to the
gc duration).
At this point, the heap looks to be increasing in size constantly and the
garbage collections seem to just reduce it a tiny amount. Here's what
bigdesk looked like at this point: http://tinypic.com/r/1611gtl/6. Here's
another bigdesk screenshot taken a little while later showing the heap
increasing: http://tinypic.com/view.php?pic=2z7o5tu&s=6.
This behaviour carries on until the heap size is at its limit and the gc
collections are taking > 20s. At this point the API is almost unresponsive
and I start getting OutOfMemory exceptions. Here's the log output at this
point: https://gist.github.com/4318255. Here's bigdesk at this point:
http://tinypic.com/r/nlnyms/6.
If I restart elasticsearch then the heap goes back to normal size, the api
becomes responsive again and all the warnings stop until of course, I've
inserted another 5 million through my batch inserter. Here's bigdesk after
I've done a restart of elasticsearch: http://tinypic.com/r/290sivc/6.
*The Question: *
What could be causing this behaviour? I've read this article (awesome
article btw):
http://jprante.github.com/2012/11/28/Elasticsearch-Java-Virtual-Machine-settings-explained.html
but due to my lack of knowledge about the JVM I'm not sure if this could be
a case of setting the ES_HEAP_SIZE too large (it's 6gb) or if it's
something else like my version of the OpenJDK?
Any thoughts greatly appreciated and if you need more info please ask.
Regards,
James
--