I'm using the scala ES library elastic4s which is just a wrapper around the java client. And I've come across what feels like a bug in some local testing, I've been loading an index with ~30 million documents (each quite small a few hundred bytes at most). To do this I've been bulk inserting with a bulk size of 1000, I then set the refresh interval to -1 to try to improve the performance, I found the Es instance would fairly rapidly fail with OOM.
Watching it in visualvm you can see the GC kicks in but recovers progressively less as it goes, whereas with an interval of say 10s it gives a nice saw tooth.
Am using ES 2.3.3.
How much heap are you assigning? What JVM? What other settings?
JVM - 1.8.0_65-b17
Settings are defaults apart from cluster name, the heap I started with default, then extended to 4g and 6g - when watching in visualvm it clearly wasn't recovering all the memory and would eventually hit what ever limit I set.
When I changed the refresh rate to 10s the GC was stable and the 2g heap was never exhausted.
How many GB is the 30 million documents?
So the latest run inserted ~20 milliion and thats sized at 4.09 GB - pretty small beer
What document structure is it, parent/child, other?
juts a flat document - no parent/child or nested documents.
The mapping contains about 6 fields, with some multi fields including unanalyzed data in the index alongside the default analyzed field.
Can you post mappings and the logs showing OOM?
Images from heap usage below - I have the logs and mappings but not sure how to upload them - can you advise please..?
Use gist/pastebin/etc for logs and mappings.
How big are the bulk requests you are making?
1000 records per bulk (note this works fine with a refresh interval of 10s)
Hi @warkolm are you planning to look at this..?
I haven't seen anything immediately obvious, sorry.
Maybe someone else can comment as well.
ok - should it be raised as a bug then?
I had OutOfMemory problems with ES 1.7 (trying to index about 10 mln documents).
I had experimented with parameters: a single bulk size (N), sleep time between indexing bulks (S1), and much longer sleep between a few (M) bulks (S2).
I ended with N=5000, S1=1s, M=10, S2=10s.
I'm sure it is very dependent on hardware you have, especially give ES as much memory as you can!
BTW. Observations of usage of CPU, memory and I/O led me to conclusions that ES need some time to process bulks and breaks when usage diagram changes its cyclic nature (for example increased I/O for longer then usual may suggest that ES will break soon - the same on your diagrams).
BTW2. It's a shame that ES breaks with OutOfMemory. It should simply slow down indexing (better strategy IMHO) or throw exception to the caller with the advice "try again later - in about N seconds".