Bug: memory leak while scrolling over index

Hi all,

I'm trying to scroll over all documents of an ElasticSearch Index using
a match_all query. I've set ES_HEAP_SIZE to 8G but I'm not able to
complete the operation, because elasticsearch runs out of memory (see
log at the end of this mail).

The head plugin tells me the index size is around 400GB with around 9.5M
documents. I'm using a single document type with the following mapping:

{
"src_doc": {
"_all": {
"enabled": false
},
"_source": {
"enabled": false
},
"properties": {
"content": {
"type": "binary"
},
"exception": {
"index": "no",
"store": true,
"type": "string"
},
"last_update": {
"format": "YYYY-MM-dd",
"store": true,
"type": "date"
},
"title": {
"index": "no",
"store": true,
"type": "string"
},
"uid": {
"index": "no",
"type": "string"
},
"url": {
"index": "no",
"store": true,
"type": "string"
}
}
}
}

I'm using ElasticSearch 0.90.2, but the issue has already been in
0.90.0. The Cluster is a single node, which is not being used otherwise.

Here's the log:

2013-07-12T13:59:39.40468 java.lang.OutOfMemoryError: Java heap space
2013-07-12T13:59:39.43523 Dumping heap to java_pid8908.hprof ...
2013-07-12T14:00:14.69555 Heap dump file created [8513498041 bytes in 35.278 secs]
2013-07-12T14:00:14.73700 [2013-07-12 16:00:14,702][WARN ][http.netty ] [graph.8908] Caught exception while handling client http traffic, closing connection [id: 0x8d823a3d, /127.0.0.1:52874 => /127.0.0.1:9250]
2013-07-12T14:00:14.73702 java.lang.OutOfMemoryError: Java heap space
2013-07-12T14:00:14.73702 at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
2013-07-12T14:00:14.73702 at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
2013-07-12T14:00:14.73702 at org.elasticsearch.common.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649)
2013-07-12T14:00:14.73703 at org.elasticsearch.common.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530)
2013-07-12T14:00:14.73703 at org.elasticsearch.common.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:77)
2013-07-12T14:00:14.73703 at org.elasticsearch.common.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46)
2013-07-12T14:00:14.73703 at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:194)
2013-07-12T14:00:14.73703 at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:152)
2013-07-12T14:00:14.73704 at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335)
2013-07-12T14:00:14.73704 at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
2013-07-12T14:00:14.73705 at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)
2013-07-12T14:00:14.73706 at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
2013-07-12T14:00:14.73706 at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
2013-07-12T14:00:14.73706 at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
2013-07-12T14:00:14.73706 at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
2013-07-12T14:00:14.73706 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
2013-07-12T14:00:14.73707 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
2013-07-12T14:00:14.73707 at java.lang.Thread.run(Thread.java:722)
2013-07-12T14:00:18.29386 [2013-07-12 16:00:18,293][WARN ][http.netty ] [graph.8908] Caught exception while handling client http traffic, closing connection [id: 0xd2b513bc, /127.0.0.1:52875 => /127.0.0.1:9250]
2013-07-12T14:00:18.29388 java.lang.OutOfMemoryError: Java heap space
2013-07-12T14:00:18.29388 at java.nio.HeapByteBuffer.(HeapByteBuffer.java:57)
2013-07-12T14:00:18.29388 at java.nio.ByteBuffer.allocate(ByteBuffer.java:331)
2013-07-12T14:00:18.29388 at org.elasticsearch.common.netty.buffer.CompositeChannelBuffer.toByteBuffer(CompositeChannelBuffer.java:649)
2013-07-12T14:00:18.29389 at org.elasticsearch.common.netty.buffer.AbstractChannelBuffer.toByteBuffer(AbstractChannelBuffer.java:530)
2013-07-12T14:00:18.29389 at org.elasticsearch.common.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:77)
2013-07-12T14:00:18.29389 at org.elasticsearch.common.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46)
2013-07-12T14:00:18.29389 at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:194)
2013-07-12T14:00:18.29389 at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:152)
2013-07-12T14:00:18.29390 at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335)
2013-07-12T14:00:18.29390 at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
2013-07-12T14:00:18.29391 at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)
2013-07-12T14:00:18.29391 at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
2013-07-12T14:00:18.29391 at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
2013-07-12T14:00:18.29391 at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
2013-07-12T14:00:18.29391 at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
2013-07-12T14:00:18.29391 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
2013-07-12T14:00:18.29392 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
2013-07-12T14:00:18.29392 at java.lang.Thread.run(Thread.java:722)

--
Cheers
Ralf

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Ralf,

What is your size limit? Without any details, your description implies that
you are trying to return all of those 9.5M documents in one response.

You can use a scroll query with a relatively small size limit (say, 100 or
so). Then take the scroll ID from each response to feed back into the next
scan.

From my experience, match_all is fine. Just don't try to return the entire
9.5M documents in one response.
See Elasticsearch Platform — Find real-time answers at scale | Elastic for
ideas.

On Monday, July 15, 2013 7:22:27 AM UTC-4, Ralf Schmitt wrote:

Hi all,

I'm trying to scroll over all documents of an Elasticsearch Index using
a match_all query. I've set ES_HEAP_SIZE to 8G but I'm not able to
complete the operation, because elasticsearch runs out of memory (see
log at the end of this mail).

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

It would be nice to learn more about the source code how you scroll over
the index.

Jörg

On Fri, Jul 19, 2013 at 6:45 PM, InquiringMind brian.from.fl@gmail.comwrote:

Ralf,

What is your size limit? Without any details, your description implies
that you are trying to return all of those 9.5M documents in one response.

You can use a scroll query with a relatively small size limit (say, 100 or
so). Then take the scroll ID from each response to feed back into the next
scan.

From my experience, match_all is fine. Just don't try to return the entire
9.5M documents in one response. See
Elasticsearch Platform — Find real-time answers at scale | Elastic for ideas.

On Monday, July 15, 2013 7:22:27 AM UTC-4, Ralf Schmitt wrote:

Hi all,

I'm trying to scroll over all documents of an Elasticsearch Index using
a match_all query. I've set ES_HEAP_SIZE to 8G but I'm not able to
complete the operation, because elasticsearch runs out of memory (see
log at the end of this mail).

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

"joergprante@gmail.com" joergprante@gmail.com writes:

It would be nice to learn more about the source code how you scroll over
the index.

thanks, I've already opened an issue on github:

The issue is already closed and I'm waiting for an ES version that ships
with Lucene 4.4.

-- Cheers
Ralf

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.