Bulk index java not freeing its memory

I'm creating a rather naive implementation of the bulk API (naive in
the sense its not zero copy although i plan too support that later
on).

Im using the hacker news database dump to insert data en masse into ES
see:
https://github.com/Mpdreamz/NEST/blob/master/src/HackerNews.Indexer/Program.cs

This fires a lot of HTTP bulk inserts calls to ES:
Example call and response: https://github.com/Mpdreamz/NEST/blob/master/docs/debug/bulkinsert.txt

If i step through my code after each
'client.IndexAsync(postQueue);' I see java's memory growing but
and never recovering from that memory whilst my indexer gets rid of
the postQueue in memory. Java jumps from 200mb to 1,5 gb when it
starts to spit out javaheap space errors since it can no longer
allocate anymore memory. The indexer stays at around 20mb's throughout
the process.

Any idea what might be going on ?

I see that you call IndexAsync, is there a chance that you are simply
creating too many concurrent bulk indexing requests into the server? Since
you don't wait for the bulk indexing request to get back, there might
eventually be hundreds of concurrent bulk indexing requests happening on the
server, eventually causing it to max out on mem. If you have 10-15
concurrent indexing actions, does it still happen?

On Sat, Nov 27, 2010 at 9:49 PM, Martijn Laarman mpdreamz@gmail.com wrote:

I'm creating a rather naive implementation of the bulk API (naive in
the sense its not zero copy although i plan too support that later
on).

Im using the hacker news database dump to insert data en masse into ES
see:

https://github.com/Mpdreamz/NEST/blob/master/src/HackerNews.Indexer/Program.cs

This fires a lot of HTTP bulk inserts calls to ES:
Example call and response:
https://github.com/Mpdreamz/NEST/blob/master/docs/debug/bulkinsert.txt

If i step through my code after each
'client.IndexAsync(postQueue);' I see java's memory growing but
and never recovering from that memory whilst my indexer gets rid of
the postQueue in memory. Java jumps from 200mb to 1,5 gb when it
starts to spit out javaheap space errors since it can no longer
allocate anymore memory. The indexer stays at around 20mb's throughout
the process.

Any idea what might be going on ?

thats what i thought at first too but i am putting a breakpoint afterwards
and even if i wait for the responses to come back in i can see
java's memory growing gradually.

Even if i use fiddler to manually fire the requests one by one i am seeing
java jumping 4 to 10mb after each request but never releasing it.

I am using 13.0 with out of the box settings.

Calling _flush has no effect either.

On Sat, Nov 27, 2010 at 10:04 PM, Shay Banon
shay.banon@elasticsearch.comwrote:

I see that you call IndexAsync, is there a chance that you are simply
creating too many concurrent bulk indexing requests into the server? Since
you don't wait for the bulk indexing request to get back, there might
eventually be hundreds of concurrent bulk indexing requests happening on the
server, eventually causing it to max out on mem. If you have 10-15
concurrent indexing actions, does it still happen?

On Sat, Nov 27, 2010 at 9:49 PM, Martijn Laarman mpdreamz@gmail.comwrote:

I'm creating a rather naive implementation of the bulk API (naive in
the sense its not zero copy although i plan too support that later
on).

Im using the hacker news database dump to insert data en masse into ES
see:

https://github.com/Mpdreamz/NEST/blob/master/src/HackerNews.Indexer/Program.cs

This fires a lot of HTTP bulk inserts calls to ES:
Example call and response:
https://github.com/Mpdreamz/NEST/blob/master/docs/debug/bulkinsert.txt

If i step through my code after each
'client.IndexAsync(postQueue);' I see java's memory growing but
and never recovering from that memory whilst my indexer gets rid of
the postQueue in memory. Java jumps from 200mb to 1,5 gb when it
starts to spit out javaheap space errors since it can no longer
allocate anymore memory. The indexer stays at around 20mb's throughout
the process.

Any idea what might be going on ?

The JVM will continue to use memory up to what you set it to use (max mem).
Once the JVM gets that memory, it will not free it to the OS, but
internally, memory will be freed once the garbage collector runs.

When you run it one by one, do you still get OutOfMemory errors?

On Sun, Nov 28, 2010 at 3:38 PM, Martijn Laarman mpdreamz@gmail.com wrote:

thats what i thought at first too but i am putting a breakpoint afterwards
and even if i wait for the responses to come back in i can see
java's memory growing gradually.

Even if i use fiddler to manually fire the requests one by one i am seeing
java jumping 4 to 10mb after each request but never releasing it.

I am using 13.0 with out of the box settings.

Calling _flush has no effect either.

On Sat, Nov 27, 2010 at 10:04 PM, Shay Banon <shay.banon@elasticsearch.com

wrote:

I see that you call IndexAsync, is there a chance that you are simply
creating too many concurrent bulk indexing requests into the server? Since
you don't wait for the bulk indexing request to get back, there might
eventually be hundreds of concurrent bulk indexing requests happening on the
server, eventually causing it to max out on mem. If you have 10-15
concurrent indexing actions, does it still happen?

On Sat, Nov 27, 2010 at 9:49 PM, Martijn Laarman mpdreamz@gmail.comwrote:

I'm creating a rather naive implementation of the bulk API (naive in
the sense its not zero copy although i plan too support that later
on).

Im using the hacker news database dump to insert data en masse into ES
see:

https://github.com/Mpdreamz/NEST/blob/master/src/HackerNews.Indexer/Program.cs

This fires a lot of HTTP bulk inserts calls to ES:
Example call and response:
https://github.com/Mpdreamz/NEST/blob/master/docs/debug/bulkinsert.txt

If i step through my code after each
'client.IndexAsync(postQueue);' I see java's memory growing but
and never recovering from that memory whilst my indexer gets rid of
the postQueue in memory. Java jumps from 200mb to 1,5 gb when it
starts to spit out javaheap space errors since it can no longer
allocate anymore memory. The indexer stays at around 20mb's throughout
the process.

Any idea what might be going on ?

Ahh i saw java eating mem and assumed the problem was there. NEST now has
built in semaphore support for async connections which solved the issues.
Setting the maxasyncconnection too high still throws outofmem exceptions but
its easy to tweak now.

Thanks :slight_smile:

On Sun, Nov 28, 2010 at 3:15 PM, Shay Banon shay.banon@elasticsearch.comwrote:

The JVM will continue to use memory up to what you set it to use (max mem).
Once the JVM gets that memory, it will not free it to the OS, but
internally, memory will be freed once the garbage collector runs.

When you run it one by one, do you still get OutOfMemory errors?

On Sun, Nov 28, 2010 at 3:38 PM, Martijn Laarman mpdreamz@gmail.comwrote:

thats what i thought at first too but i am putting a breakpoint afterwards
and even if i wait for the responses to come back in i can see
java's memory growing gradually.

Even if i use fiddler to manually fire the requests one by one i am seeing
java jumping 4 to 10mb after each request but never releasing it.

I am using 13.0 with out of the box settings.

Calling _flush has no effect either.

On Sat, Nov 27, 2010 at 10:04 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

I see that you call IndexAsync, is there a chance that you are simply
creating too many concurrent bulk indexing requests into the server? Since
you don't wait for the bulk indexing request to get back, there might
eventually be hundreds of concurrent bulk indexing requests happening on the
server, eventually causing it to max out on mem. If you have 10-15
concurrent indexing actions, does it still happen?

On Sat, Nov 27, 2010 at 9:49 PM, Martijn Laarman mpdreamz@gmail.comwrote:

I'm creating a rather naive implementation of the bulk API (naive in
the sense its not zero copy although i plan too support that later
on).

Im using the hacker news database dump to insert data en masse into ES
see:

https://github.com/Mpdreamz/NEST/blob/master/src/HackerNews.Indexer/Program.cs

This fires a lot of HTTP bulk inserts calls to ES:
Example call and response:
https://github.com/Mpdreamz/NEST/blob/master/docs/debug/bulkinsert.txt

If i step through my code after each
'client.IndexAsync(postQueue);' I see java's memory growing but
and never recovering from that memory whilst my indexer gets rid of
the postQueue in memory. Java jumps from 200mb to 1,5 gb when it
starts to spit out javaheap space errors since it can no longer
allocate anymore memory. The indexer stays at around 20mb's throughout
the process.

Any idea what might be going on ?