[0.19.1] OutOfMemoryError after setting a large result size

Eric_Jain · March 28, 2012, 1:26am

SearchSourceBuilder search = new SearchSourceBuilder()
.query(...).from(0).size(Integer.MAX_VALUE);
index.search(search).hits();

-> Exception in thread "elasticsearch[search]-pool-47-thread-1"
java.lang.OutOfMemoryError: Java heap space
at
org.elasticsearch.search.SearchService.shortcutDocIdsToLoad(SearchService.java:
579)
at
org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:
317)

I'd be less surprised if the index didn't just have a dozen small
documents; looks like elasticsearch is preallocating a large array?

Paul_Smith · March 28, 2012, 2:00am

My Lucene internals may be out of date, but if it's the same as a while
back, the PriorityQueue used to hold the results is backed by an array with
size .

If you want all results, don't 'search', 'scroll' instead.

On 28 March 2012 12:26, Eric Jain eric.jain@gmail.com wrote:

SearchSourceBuilder search = new SearchSourceBuilder()
.query(...).from(0).size(Integer.MAX_VALUE);
index.search(search).hits();

-> Exception in thread "elasticsearch[search]-pool-47-thread-1"
java.lang.OutOfMemoryError: Java heap space
at

org.elasticsearch.search.SearchService.shortcutDocIdsToLoad(SearchService.java:
579)
at

org.elasticsearch.search.SearchService.executeFetchPhase(SearchService.java:
317)

I'd be less surprised if the index didn't just have a dozen small
documents; looks like elasticsearch is preallocating a large array?

Eric_Jain · March 28, 2012, 6:40am

On Mar 27, 7:00 pm, Paul Smith tallpsm...@gmail.com wrote:

My Lucene internals may be out of date, but if it's the same as a while
back, the PriorityQueue used to hold the results is backed by an array with
size .

If you want all results, don't 'search', 'scroll' instead.

I can do that, but wouldn't it make sense to allocate no more than
Math.min(size, numberOfDocuments)?

Thomas_Peuss · March 28, 2012, 6:56am

Hi Eric!

Am Mittwoch, 28. März 2012 08:40:09 UTC+2 schrieb Eric Jain:

On Mar 27, 7:00 pm, Paul Smith tallpsm...@gmail.com wrote:

My Lucene internals may be out of date, but if it's the same as a while
back, the PriorityQueue used to hold the results is backed by an array
with
size .

If you want all results, don't 'search', 'scroll' instead.

I can do that, but wouldn't it make sense to allocate no more than
Math.min(size, numberOfDocuments)?

If you really need all documents of the index you should really have a look
at the "scan/scroll"-API
(Elasticsearch Platform — Find real-time answers at scale | Elastic).
If you know that your index will hold only a few hundred documents then why
use MAX_VALUE?

Simple math like Math.min() can be dangerous because you allocate the array
and then start to fill it with hits of the index. While you do that your
index can become larger because of fresh documents arriving. This might be
no problem in your case but we index 1000 docs/s.

CU
Thomas

Paul_Smith · March 28, 2012, 6:57am

On Wednesday, 28 March 2012, Eric Jain eric.jain@gmail.com wrote:

On Mar 27, 7:00 pm, Paul Smith tallpsm...@gmail.com wrote:

My Lucene internals may be out of date, but if it's the same as a while
back, the PriorityQueue used to hold the results is backed by an array
with
size .

If you want all results, don't 'search', 'scroll' instead.

I can do that, but wouldn't it make sense to allocate no more than
Math.min(size, numberOfDocuments)?

Lucene is optimizing for the common case of only the top X of hits being
needed. It's more efficient for scoring sort to do it this way. Far less
memory is used. How would you sort say a billion documents? Only documents
that match the query/filters are passed through the priority queue for
sorting and for common case of only needing top X we're talking a lot less
overall comparisons and a lot less memory. The larger the hit size the more
comparisons and memory.

even if you did the math.min, for us with large indexes it would be
allocating an array of huge size when we only want the first 25 most of the
time. That's a waste.

Eric_Jain · March 28, 2012, 5:54pm

On Tue, Mar 27, 2012 at 23:56, Thomas Peuss thomas.peuss@nterra.com wrote:

If you really need all documents of the index you should really have a look
at the "scan/scroll"-API
(Elasticsearch Platform — Find real-time answers at scale | Elastic).
If you know that your index will hold only a few hundred documents then why
use MAX_VALUE?

Simple math like Math.min() can be dangerous because you allocate the array
and then start to fill it with hits of the index. While you do that your
index can become larger because of fresh documents arriving. This might be
no problem in your case but we index 1000 docs/s.

Hadn't considered this case; I wouldn't mind if it was just new docs
that were missing, but having some new docs showing up in place of
older docs could indeed be confusing.

Topic		Replies	Views
OutOfMemory Error for Priority Queue for "Sync" search Elasticsearch	1	351	July 6, 2017
OutOfMemory Error for Priority Queue for "Sync" search Elasticsearch	1	312	July 6, 2017
ArrayIndexOutOfBoundsException while searching index, the number of return documents shoud be very large Elasticsearch	4	1400	July 5, 2017
Non-Paged Query And Performance Elasticsearch	4	372	July 6, 2017
Query Failed [Failed to execute main query] OutOfMemoryError: Java heap space Elasticsearch	2	1313	July 5, 2017

[0.19.1] OutOfMemoryError after setting a large result size

Related topics