OutOfMemory using bulk insert and a refresh interval of -1

franekrichardson · June 10, 2016, 3:31pm

Hi,

I'm using the scala ES library elastic4s which is just a wrapper around the java client. And I've come across what feels like a bug in some local testing, I've been loading an index with ~30 million documents (each quite small a few hundred bytes at most). To do this I've been bulk inserting with a bulk size of 1000, I then set the refresh interval to -1 to try to improve the performance, I found the Es instance would fairly rapidly fail with OOM.

Watching it in visualvm you can see the GC kicks in but recovers progressively less as it goes, whereas with an interval of say 10s it gives a nice saw tooth.

Am using ES 2.3.3.

Cheers,

Franek.

warkolm · June 12, 2016, 9:02am

How much heap are you assigning? What JVM? What other settings?

franekrichardson · June 12, 2016, 1:49pm

JVM - 1.8.0_65-b17

Settings are defaults apart from cluster name, the heap I started with default, then extended to 4g and 6g - when watching in visualvm it clearly wasn't recovering all the memory and would eventually hit what ever limit I set.

When I changed the refresh rate to 10s the GC was stable and the 2g heap was never exhausted.

warkolm · June 12, 2016, 9:51pm

How many GB is the 30 million documents?

franekrichardson · June 13, 2016, 8:28am

So the latest run inserted ~20 milliion and thats sized at 4.09 GB - pretty small beer

warkolm · June 13, 2016, 10:39pm

What document structure is it, parent/child, other?

franekrichardson · June 14, 2016, 7:38am

juts a flat document - no parent/child or nested documents.

The mapping contains about 6 fields, with some multi fields including unanalyzed data in the index alongside the default analyzed field.

warkolm · June 14, 2016, 7:39am

Can you post mappings and the logs showing OOM?

franekrichardson · June 14, 2016, 8:30am

Images from heap usage below - I have the logs and mappings but not sure how to upload them - can you advise please..?

warkolm · June 14, 2016, 8:32am

Use gist/pastebin/etc for logs and mappings.

franekrichardson · June 14, 2016, 8:38am

Thanks see below:

address mapping

es log

es console

warkolm · June 14, 2016, 9:49pm

How big are the bulk requests you are making?

franekrichardson · June 15, 2016, 9:53am

1000 records per bulk (note this works fine with a refresh interval of 10s)

franekrichardson · June 30, 2016, 1:01pm

Any update on this..?

franekrichardson · July 19, 2016, 2:32pm

Hi @warkolm are you planning to look at this..?

warkolm · July 19, 2016, 9:22pm

I haven't seen anything immediately obvious, sorry.
Maybe someone else can comment as well.

franekrichardson · July 25, 2016, 8:52am

ok - should it be raised as a bug then?

andrzejm · July 27, 2016, 3:24pm

I had OutOfMemory problems with ES 1.7 (trying to index about 10 mln documents).
I had experimented with parameters: a single bulk size (N), sleep time between indexing bulks (S1), and much longer sleep between a few (M) bulks (S2).
I ended with N=5000, S1=1s, M=10, S2=10s.
I'm sure it is very dependent on hardware you have, especially give ES as much memory as you can!

BTW. Observations of usage of CPU, memory and I/O led me to conclusions that ES need some time to process bulks and breaks when usage diagram changes its cyclic nature (for example increased I/O for longer then usual may suggest that ES will break soon - the same on your diagrams).

BTW2. It's a shame that ES breaks with OutOfMemory. It should simply slow down indexing (better strategy IMHO) or throw exception to the caller with the advice "try again later - in about N seconds".

Topic		Replies	Views
Using the Bulk Indexing API, if my node crashes, my elasticsearch heap memory does not get freed Elasticsearch	6	800	July 6, 2017
ES 1.7.1 bulk inserts problems Elasticsearch	2	1196	July 5, 2017
Garbage collection not kicking in - Heap is growing to 98% Elasticsearch	3	930	June 29, 2017
OutOfMemory Exceptions during bulk insert Elasticsearch	9	2935	July 6, 2017
ES 2.3.3 - Cannot understand why OOM Elasticsearch	9	1326	April 6, 2017

OutOfMemory using bulk insert and a refresh interval of -1

Related topics