Regarding memory consumption in elastic search

Hi

I am currently using elasticsearch 0.9.19. The machine I am using is havng
around 300Gb disk space and the RAM on it is around 23Gb. I have allocated
around 10Gb of ram to elastic search. My operations are write intensive.
They are around 1000docs/s. I am only running elastic search on the machine
and no other process. The doc size is not large. They are small only with
not more than 10 fields. The elastic search is being run only on one
machine with 1 shard and 0 replicas.

The memory used, starts increasing very rapidly when I am sending 1000
docs/s. Though I have allocated 10Gb RAM only to elastic search but still
almost 21 gb ram gets consumed and eventually the elastic search process
results in out of heap space. Later I need to clear the OS cache to free
all the memory. Even when I stop sending elastic search, 1000docs/s then
also the memory does not get automatically cleared.

So For eg. If I am running elastic search with around 1000doc/s write
operations then, I found that it went to 18 gb Ram usage very quickly and
later when I reduced my write operations to only 10 docs/s then also the
memory used still shows around 18 gb. Which I think should come down with
decrease in the number of write operations. I am using Bulk API for
performing my write operations with size of 100 docs per query. The data is
coming from 4 machines when the write operations are around 1000docs/sec

These are the figures which I am getting after doing top
Mem: 24731664k total, 18252700k used, 6478964k free, 322492k buffers
Swap: 4194296k total, 0k used, 4194296k free, 8749780k
cached

PID USER PR NI VIRT RES SHR S %CPU %MEM
TIME+
COMMAND

1004 elastics 20 0 10.7g 8.3g 10m S 1
35.3 806:28.69 java

Please tell if any one has any idea, what could be the reason for this. I
have to stop my application because of this issue. I think I am missing any
configuration. I have already read all the cache related documentations for
the elastic search over here
http://www.elasticsearch.org/guide/reference/index-modules/cache.html

I have also tried clearing cache using clear cache API and also tried flush
api. But didnot got any improvement.

Thanks in advance.

--

Can you provide us with details of the "out of heap space" messages?

Is is "OutOfMemoryException"?

What kind of API do you use? What client?

Since your top output looks reasonable, there might be other
causes. OutOfMemoryException is also relevant to socket resources for
example, if you don't carefully manage the clients.

It's not the cache, the cache is only for queries.

Best regards,

Jörg

--

Thanks for the reply.

This is the error which I got
[2012-11-21 00:26:17,510][WARN ][index.engine.robin ] [Primus]
[cms_audit][0] failed engine
java.lang.OutOfMemoryError: Java heap space
at
org.apache.lucene.util.PagedBytes$PagedBytesDataOutput.writeBytes(PagedBytes.java:502)
at org.apache.lucene.store.DataOutput.writeString(DataOutput.java:114)
at
org.apache.lucene.index.TermInfosReaderIndex.(TermInfosReaderIndex.java:86)
at org.apache.lucene.index.TermInfosReader.(TermInfosReader.java:116)
at
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:83)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:116)
at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:696)
at
org.apache.lucene.index.IndexWriter$ReaderPool.getReadOnlyClone(IndexWriter.java:654)
at org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:142)
at
org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirectoryReader.java:36)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:451)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:399)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:296)
at org.apache.lucene.search.SearcherManager.(SearcherManager.java:82)
at
org.elasticsearch.index.engine.robin.RobinEngine.buildSearchManager(RobinEngine.java:1371)
at
org.elasticsearch.index.engine.robin.RobinEngine.flush(RobinEngine.java:838)
at
org.elasticsearch.index.engine.robin.RobinEngine.updateIndexingBufferSize(RobinEngine.java:221)
at
org.elasticsearch.indices.memory.IndexingMemoryController$ShardsIndicesStatusChecker.run(IndexingMemoryController.java:178)
at
org.elasticsearch.threadpool.ThreadPool$LoggingRunnable.run(ThreadPool.java:297)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
[2012-11-21 00:26:17,945][WARN ][index.engine.robin ] [Primus]
[cms_audit][0] failed to flush after setting shard to inactive
org.elasticsearch.index.engine.FlushFailedEngineException: [cms_audit][0]
Flush failed
at
org.elasticsearch.index.engine.robin.RobinEngine.flush(RobinEngine.java:844)
at
org.elasticsearch.index.engine.robin.RobinEngine.updateIndexingBufferSize(RobinEngine.java:221)
at
org.elasticsearch.indices.memory.IndexingMemoryController$ShardsIndicesStatusChecker.run(IndexingMemoryController.java:178)
at
org.elasticsearch.threadpool.ThreadPool$LoggingRunnable.run(ThreadPool.java:297)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)

I am using ruby script to transfer my data to Elasticsearch. Below is the
code for that.
RestClient.post(HOST+"/_bulk", message);
The RestClient is in the rest_client library of ruby.
The bulk api batch size is 100 docs.

Besides Please explain why do you feel that the top output is reasonable?
My interpretation(which might be wrong) was 10GB RAM must be large enough
for the elasticsearch to work, and if it is allocating more than that, it
must also deallocate the memory. because RAM allocation was continuously
increasing when I was sending data to the server and it did not reduced or
get deallocated when I stopped sending the data.

On Monday, 26 November 2012 15:08:50 UTC+5:30, Jörg Prante wrote:

Can you provide us with details of the "out of heap space" messages?

Is is "OutOfMemoryException"?

What kind of API do you use? What client?

Since your top output looks reasonable, there might be other
causes. OutOfMemoryException is also relevant to socket resources for
example, if you don't carefully manage the clients.

It's not the cache, the cache is only for queries.

Best regards,

Jörg

On Monday, 26 November 2012 15:08:50 UTC+5:30, Jörg Prante wrote:

Can you provide us with details of the "out of heap space" messages?

Is is "OutOfMemoryException"?

What kind of API do you use? What client?

Since your top output looks reasonable, there might be other
causes. OutOfMemoryException is also relevant to socket resources for
example, if you don't carefully manage the clients.

It's not the cache, the cache is only for queries.

Best regards,

Jörg

On Monday, 26 November 2012 15:08:50 UTC+5:30, Jörg Prante wrote:

Can you provide us with details of the "out of heap space" messages?

Is is "OutOfMemoryException"?

What kind of API do you use? What client?

Since your top output looks reasonable, there might be other
causes. OutOfMemoryException is also relevant to socket resources for
example, if you don't carefully manage the clients.

It's not the cache, the cache is only for queries.

Best regards,

Jörg

--

Thanks for the reply.

This is the error which I got
[2012-11-21 00:26:17,510][WARN ][index.engine.robin ] [Primus]
[cms_audit][0] failed engine
java.lang.OutOfMemoryError: Java heap space
at
org.apache.lucene.util.PagedBytes$PagedBytesDataOutput.writeBytes(PagedBytes.java:502)
at org.apache.lucene.store.DataOutput.writeString(DataOutput.java:114)
at
org.apache.lucene.index.TermInfosReaderIndex.(TermInfosReaderIndex.java:86)
at org.apache.lucene.index.TermInfosReader.(TermInfosReader.java:116)
at
org.apache.lucene.index.SegmentCoreReaders.(SegmentCoreReaders.java:83)
at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:116)
at org.apache.lucene.index.IndexWriter$ReaderPool.get(IndexWriter.java:696)
at
org.apache.lucene.index.IndexWriter$ReaderPool.getReadOnlyClone(IndexWriter.java:654)
at org.apache.lucene.index.DirectoryReader.(DirectoryReader.java:142)
at
org.apache.lucene.index.ReadOnlyDirectoryReader.(ReadOnlyDirectoryReader.java:36)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:451)
at org.apache.lucene.index.IndexWriter.getReader(IndexWriter.java:399)
at org.apache.lucene.index.IndexReader.open(IndexReader.java:296)
at org.apache.lucene.search.SearcherManager.(SearcherManager.java:82)
at
org.elasticsearch.index.engine.robin.RobinEngine.buildSearchManager(RobinEngine.java:1371)
at
org.elasticsearch.index.engine.robin.RobinEngine.flush(RobinEngine.java:838)
at
org.elasticsearch.index.engine.robin.RobinEngine.updateIndexingBufferSize(RobinEngine.java:221)
at
org.elasticsearch.indices.memory.IndexingMemoryController$ShardsIndicesStatusChecker.run(IndexingMemoryController.java:178)
at
org.elasticsearch.threadpool.ThreadPool$LoggingRunnable.run(ThreadPool.java:297)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)
[2012-11-21 00:26:17,945][WARN ][index.engine.robin ] [Primus]
[cms_audit][0] failed to flush after setting shard to inactive
org.elasticsearch.index.engine.FlushFailedEngineException: [cms_audit][0]
Flush failed
at
org.elasticsearch.index.engine.robin.RobinEngine.flush(RobinEngine.java:844)
at
org.elasticsearch.index.engine.robin.RobinEngine.updateIndexingBufferSize(RobinEngine.java:221)
at
org.elasticsearch.indices.memory.IndexingMemoryController$ShardsIndicesStatusChecker.run(IndexingMemoryController.java:178)
at
org.elasticsearch.threadpool.ThreadPool$LoggingRunnable.run(ThreadPool.java:297)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:351)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:178)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:165)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:267)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:636)

The complete error is quite big. Please see the attachment for that.
I am using ruby script to transfer my data to Elasticsearch. Below is the
code for that.
RestClient.post(HOST+"/_bulk", message);
The RestClient is in the rest_client library of ruby.
The bulk api batch size is 100 docs.

Besides Please explain why do you feel that the top output is reasonable?
My interpretation(which might be wrong) was 10GB RAM must be large enough
for the elasticsearch to work, and if it is allocating more than that, it
must also deallocate the memory. because RAM allocation was continuously
increasing when I was sending data to the server and it did not reduced or
get deallocated when I stopped sending the data.

On Monday, 26 November 2012 15:08:50 UTC+5:30, Jörg Prante wrote:

Can you provide us with details of the "out of heap space" messages?

Is is "OutOfMemoryException"?

What kind of API do you use? What client?

Since your top output looks reasonable, there might be other
causes. OutOfMemoryException is also relevant to socket resources for
example, if you don't carefully manage the clients.

It's not the cache, the cache is only for queries.

Best regards,

Jörg

--

I was unable to attach. Please check the following link for the complete
error.

--

Thanks for the info, but unfortunately, the attachment did not come though.
We would be happy about examining the complete stack trace - it shows the
reason of the failure. Can you gist the complete stack trace?

With ruby using the bulk API, I think one reason could be data congestion
if there is no evaluation of the BulkResponses. More outstanding
BulkResponses show that ES needs more time. AFAIK ruby clients do not care
about throttling the bulk API in such cases. Ignoring bulk limits can lead
to congestion because Lucene indexing is at peak times very memory
consuming. You can try to tune the indexing (segment merging), decreasing
the segments_per_tier,
see http://www.elasticsearch.org/guide/reference/index-modules/merge.html

The top output looks reasonable because you configured 10g of heap space,
the allocated RAM is 8.3g. Note that Java JVMs generally do not give back
memory to the operating system.

Best regards,

Jörg

--

Thanks, yes, it's sort of a memory congestion, the ES heap is fully
allocated.

You could try to estimate how much data per second you write into ES over
the API. Then you could try to reduce the amount. Throttling the API could
be worthwhile, if possible in Ruby. And if you know what you are doing, you
could tune the Segment merge module
Elasticsearch Platform — Find real-time answers at scale | Elastic in
order to tell Lucene not to use so much memory and index more often instead.

Best regards,

Jörg

On Monday, November 26, 2012 12:15:35 PM UTC+1, shirsh bansal wrote:

I was unable to attach. Please check the following link for the complete
error.

https://docs.google.com/document/d/1psA843Z6TDupr0IIapY85unh7m5Sqk6AfoqO6CGmDWo/edit

--