Getting OOME's (Out of Memory Exceptions) to stop

Robin_Clarke · June 25, 2014, 1:54pm

I have a 10 machine cluster where frequently (about once per day when
indexing and querying is at its height) one elasticsearch node goes OOM...
It usually recovers, but by this time the cluster is redistributing the
lost shards, which causes more load, which often in turn causes an OOM on
another machine.
Each machine has 32GB memory of which I currently have 12GB allocated to
Elasticsearch. I have logstash (max 500M) and redis (max 2GB) running on
the machines too, and see that the remaining ~17GB is used for file
cache... i.e. it all looks healthy, up until the moment when elasticsearch
spews e.g. this sequence of errors:

Actual Exception
org.elasticsearch.search.query.QueryPhaseExecutionException:
[logstash-2014.06.24][1]: query[ConstantScore(:)],from[0],size[0]: Query
Failed [Failed to execute main query] at
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:127) at
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:257)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:623)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:612)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744) Caused by:
java.lang.OutOfMemoryError: Java heap space

Failed to send error message back to client for action [search/phase/query]
java.lang.OutOfMemoryError: Java heap space

Actual Exception org.elasticsearch.index.IndexShardMissingException:
[logstash-2014.06.25][3] missing at
org.elasticsearch.index.service.InternalIndexService.shardSafe(InternalIndexService.java:182)
at
org.elasticsearch.search.SearchService.createContext(SearchService.java:496)
at
org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:480)
at
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:252)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:623)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:612)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

Any ideas what might be going wrong here, or what I might be able to do to
remedy the situation?

Cheers,
-Robin-

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/096b5a00-745e-4140-a804-5e7b5afcdf9d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Michael_Hart · June 25, 2014, 2:55pm

What does your GC Old Count and GC Old Duration look like? Do you have
warnings in the logs about long GC's?
I've got similar issues and telltale sign of when things are about to go
south is when the old GC count starts to rise, and GC old duration
increases.

On Wednesday, June 25, 2014 9:54:27 AM UTC-4, Robin Clarke wrote:

I have a 10 machine cluster where frequently (about once per day when
indexing and querying is at its height) one elasticsearch node goes OOM...
It usually recovers, but by this time the cluster is redistributing the
lost shards, which causes more load, which often in turn causes an OOM on
another machine.
Each machine has 32GB memory of which I currently have 12GB allocated to
Elasticsearch. I have logstash (max 500M) and redis (max 2GB) running on
the machines too, and see that the remaining ~17GB is used for file
cache... i.e. it all looks healthy, up until the moment when elasticsearch
spews e.g. this sequence of errors:

Actual Exception
org.elasticsearch.search.query.QueryPhaseExecutionException:
[logstash-2014.06.24][1]: query[ConstantScore(:)],from[0],size[0]: Query
Failed [Failed to execute main query] at
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:127) at
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:257)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:623)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:612)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744) Caused by:
java.lang.OutOfMemoryError: Java heap space

Failed to send error message back to client for action
[search/phase/query] java.lang.OutOfMemoryError: Java heap space

Actual Exception org.elasticsearch.index.IndexShardMissingException:
[logstash-2014.06.25][3] missing at
org.elasticsearch.index.service.InternalIndexService.shardSafe(InternalIndexService.java:182)
at
org.elasticsearch.search.SearchService.createContext(SearchService.java:496)
at
org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:480)
at
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:252)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:623)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:612)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

Any ideas what might be going wrong here, or what I might be able to do to
remedy the situation?

Cheers,
-Robin-

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/4222efd2-e3ce-4bba-bb76-751c688fc0d7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Robin_Clarke · June 26, 2014, 12:10pm

Very few GC messages in the logs, and none around the OOM instances...

Cheers,
-Robin-

On Wednesday, 25 June 2014 16:55:03 UTC+2, Michael Hart wrote:

What does your GC Old Count and GC Old Duration look like? Do you have
warnings in the logs about long GC's?
I've got similar issues and telltale sign of when things are about to go
south is when the old GC count starts to rise, and GC old duration
increases.

On Wednesday, June 25, 2014 9:54:27 AM UTC-4, Robin Clarke wrote:

I have a 10 machine cluster where frequently (about once per day when
indexing and querying is at its height) one elasticsearch node goes OOM...
It usually recovers, but by this time the cluster is redistributing the
lost shards, which causes more load, which often in turn causes an OOM on
another machine.
Each machine has 32GB memory of which I currently have 12GB allocated to
Elasticsearch. I have logstash (max 500M) and redis (max 2GB) running on
the machines too, and see that the remaining ~17GB is used for file
cache... i.e. it all looks healthy, up until the moment when elasticsearch
spews e.g. this sequence of errors:

Actual Exception
org.elasticsearch.search.query.QueryPhaseExecutionException:
[logstash-2014.06.24][1]: query[ConstantScore(:)],from[0],size[0]: Query
Failed [Failed to execute main query] at
org.elasticsearch.search.query.QueryPhase.execute(QueryPhase.java:127) at
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:257)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:623)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:612)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744) Caused by:
java.lang.OutOfMemoryError: Java heap space

Failed to send error message back to client for action
[search/phase/query] java.lang.OutOfMemoryError: Java heap space

Actual Exception org.elasticsearch.index.IndexShardMissingException:
[logstash-2014.06.25][3] missing at
org.elasticsearch.index.service.InternalIndexService.shardSafe(InternalIndexService.java:182)
at
org.elasticsearch.search.SearchService.createContext(SearchService.java:496)
at
org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:480)
at
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:252)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:623)
at
org.elasticsearch.search.action.SearchServiceTransportAction$SearchQueryTransportHandler.messageReceived(SearchServiceTransportAction.java:612)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:270)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

Any ideas what might be going wrong here, or what I might be able to do
to remedy the situation?

Cheers,
-Robin-

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/66d2dcc6-5ed0-4032-bbf7-b49ca662dbd5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
ES OutOfMemory on a 30GB index Elasticsearch	6	1002	July 6, 2017
ElasticSearch OutOfMemory Exceptions Elasticsearch	8	364	July 6, 2017
OOM on aggregation and lot of time out exceptions Elasticsearch	7	1595	July 5, 2017
OOM Error Elasticsearch	8	1241	October 13, 2020
Aggregate query: Elasticsearch:java.lang.OutOfMemoryError: Java heap space Elasticsearch	8	1450	July 25, 2019

Getting OOME's (Out of Memory Exceptions) to stop

Related topics