Please help to understand these Exceptions

Chris_Neal · March 4, 2015, 3:32pm

Hi all,

I'm hoping someone can help me piece together the below log entries/stack
traces/Exceptions. I have a 3 node cluster in Development in EC2, and two
of them had issues. I'm running ES 1.4.4, 32GB RAM, 16GB heaps, dedicated
servers to ES. My idex rate averages about 10k/sec. There were no
searches going on at the time of the incident.

It appears to me that node 10.0.0.12 began timing out requests to 10.0.45,
indicating that 10.0.0.45 was having issues.
Then at 4:36, 10.0.0.12 logs the ERROR about "Uncaught exception:
IndexWriter already closed", caused by an OOME.
Then at 4:43, 10.0.0.45 hits the "Create failed" WARN, and logs an OOME.
Then things are basically down and unresponsive.

What is weird to me is that if 10.0.0.45 was the node having issues, why
did 10.0.0.12 log an exception 7 minutes before that? Did both nodes run
out of memory? Or is one of the Exceptions actually saying, "I see that
this other node hit an OOME, and I'm telling you about it."

I have a few values tweaked in the elasticsearch.yml file to try and keep
this from happening (configured from Puppet):

'indices.breaker.fielddata.limit' => '20%',
'indices.breaker.total.limit' => '25%',
'indices.breaker.request.limit' => '10%',
'index.merge.scheduler.type' => 'concurrent',
'index.merge.scheduler.max_thread_count' => '1',
'index.merge.policy.type' => 'tiered',
'index.merge.policy.max_merged_segment' => '1gb',
'index.merge.policy.segments_per_tier' => '4',
'index.merge.policy.max_merge_at_once' => '4',
'index.merge.policy.max_merge_at_once_explicit' => '4',
'indices.memory.index_buffer_size' => '10%',
'indices.store.throttle.type' => 'none',
'index.translog.flush_threshold_size' => '1GB',

I have done a fair bit of reading on this, and have tried about everything
I can think of.

Can anyone tell me what caused this scenario, and what can be done to avoid
it?
Thank you so much for taking the time to read this.
Chris

=====
On server 10.0.0.12 http://10.0.0.12:

[2015-03-04 03:56:12,548][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [20456ms] ago, timed out [5392ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70061596]
[2015-03-04 04:06:02,407][INFO ][index.engine.internal ]
[elasticsearch-ip-10-0-0-12] [derbysoft-ihg-20150304][2] now throttling
indexing: numMergesInFlight=4, maxNumMerges=3
[2015-03-04 04:06:04,141][INFO ][index.engine.internal ]
[elasticsearch-ip-10-0-0-12] [derbysoft-ihg-20150304][2] stop throttling
indexing: numMergesInFlight=2, maxNumMerges=3
[2015-03-04 04:12:26,194][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [15709ms] ago, timed out [708ms] ago, action
[cluster:monitor/nodes/sta
ts[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70098828]
[2015-03-04 04:23:40,778][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [21030ms] ago, timed out [6030ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70124234]
[2015-03-04 04:24:47,023][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [27275ms] ago, timed out [12275ms] ago, action
[cluster:monitor/nodes/s
tats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70126273]
[2015-03-04 04:25:39,180][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [19431ms] ago, timed out [4431ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70127835]
[2015-03-04 04:26:40,775][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [19241ms] ago, timed out [4241ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70129981]
[2015-03-04 04:27:14,329][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [22676ms] ago, timed out [6688ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70130668]
[2015-03-04 04:28:15,695][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [24042ms] ago, timed out [9041ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70132644]
[2015-03-04 04:29:38,102][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [16448ms] ago, timed out [1448ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70135333]
[2015-03-04 04:33:42,393][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [20738ms] ago, timed out [5737ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70142427]
[2015-03-04 04:36:08,788][ERROR][marvel.agent ]
[elasticsearch-ip-10-0-0-12] Background thread had an uncaught exception:
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:698)
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:712)
at
org.apache.lucene.index.IndexWriter.ramBytesUsed(IndexWriter.java:462)
at
org.elasticsearch.index.engine.internal.InternalEngine.segmentsStats(InternalEngine.java:1224)
at
org.elasticsearch.index.shard.service.InternalIndexShard.segmentStats(InternalIndexShard.java:555)
at
org.elasticsearch.action.admin.indices.stats.CommonStats.(CommonStats.java:170)
at
org.elasticsearch.action.admin.indices.stats.ShardStats.(ShardStats.java:49)
at
org.elasticsearch.indices.InternalIndicesService.stats(InternalIndicesService.java:212)
at
org.elasticsearch.indices.InternalIndicesService.stats(InternalIndicesService.java:172)
at
org.elasticsearch.node.service.NodeService.stats(NodeService.java:138)
at
org.elasticsearch.marvel.agent.AgentService$ExportingWorker.exportNodeStats(AgentService.java:300)
at
org.elasticsearch.marvel.agent.AgentService$ExportingWorker.run(AgentService.java:225)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space

=====
On server 10.0.0.45 http://10.0.0.45:

[2015-03-04 04:43:27,245][WARN ][index.engine.internal ]
[elasticsearch-ip-10-0-0-45] [myindex-20150304][1] failed engine
[indices:data/write/bulk[s] failed on replica]
org.elasticsearch.index.engine.CreateFailedEngineException:
[myindex-20150304][1] Create failed for [my_type#AUvjGHoiku-fZf277h_4]
at
org.elasticsearch.index.engine.internal.InternalEngine.create(InternalEngine.java:421)
at
org.elasticsearch.index.shard.service.InternalIndexShard.create(InternalIndexShard.java:403)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnReplica(TransportShardBulkAction.java:595)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:246)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:225)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter
is closed
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:698)
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:712)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1507)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1246)
at
org.elasticsearch.index.engine.internal.InternalEngine.innerCreateNoLock(InternalEngine.java:502)
at
org.elasticsearch.index.engine.internal.InternalEngine.innerCreate(InternalEngine.java:444)
at
org.elasticsearch.index.engine.internal.InternalEngine.create(InternalEngine.java:413)
... 8 more
Caused by: java.lang.OutOfMemoryError: Java heap space

=====

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAND3DphzaT3Np5TBW%2B-h_aOo9BScPu_5QO9qCqnYLp__JCjOPA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · March 6, 2015, 9:01pm

You really need some kind of monitoring, like Marvel, around this to give
you an idea of what was happening prior to the OOM.
Generally a node becoming unresponsive will be due to GC, so take a look at
the timings there.

On 5 March 2015 at 02:32, Chris Neal chris.neal@derbysoft.net wrote:

Hi all,

I'm hoping someone can help me piece together the below log entries/stack
traces/Exceptions. I have a 3 node cluster in Development in EC2, and two
of them had issues. I'm running ES 1.4.4, 32GB RAM, 16GB heaps, dedicated
servers to ES. My idex rate averages about 10k/sec. There were no
searches going on at the time of the incident.

It appears to me that node 10.0.0.12 began timing out requests to 10.0.45,
indicating that 10.0.0.45 was having issues.
Then at 4:36, 10.0.0.12 logs the ERROR about "Uncaught exception:
IndexWriter already closed", caused by an OOME.
Then at 4:43, 10.0.0.45 hits the "Create failed" WARN, and logs an OOME.
Then things are basically down and unresponsive.

What is weird to me is that if 10.0.0.45 was the node having issues, why
did 10.0.0.12 log an exception 7 minutes before that? Did both nodes run
out of memory? Or is one of the Exceptions actually saying, "I see that
this other node hit an OOME, and I'm telling you about it."

I have a few values tweaked in the elasticsearch.yml file to try and keep
this from happening (configured from Puppet):
'indices.breaker.fielddata.limit' => '20%',
'indices.breaker.total.limit' => '25%',
'indices.breaker.request.limit' => '10%',
'index.merge.scheduler.type' => 'concurrent',
'index.merge.scheduler.max_thread_count' => '1',
'index.merge.policy.type' => 'tiered',
'index.merge.policy.max_merged_segment' => '1gb',
'index.merge.policy.segments_per_tier' => '4',
'index.merge.policy.max_merge_at_once' => '4',
'index.merge.policy.max_merge_at_once_explicit' => '4',
'indices.memory.index_buffer_size' => '10%',
'indices.store.throttle.type' => 'none',
'index.translog.flush_threshold_size' => '1GB',
I have done a fair bit of reading on this, and have tried about everything
I can think of.

Can anyone tell me what caused this scenario, and what can be done to
avoid it?
Thank you so much for taking the time to read this.
Chris

=====
On server 10.0.0.12 http://10.0.0.12:

[2015-03-04 03:56:12,548][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [20456ms] ago, timed out [5392ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70061596]
[2015-03-04 04:06:02,407][INFO ][index.engine.internal ]
[elasticsearch-ip-10-0-0-12] [derbysoft-ihg-20150304][2] now throttling
indexing: numMergesInFlight=4, maxNumMerges=3
[2015-03-04 04:06:04,141][INFO ][index.engine.internal ]
[elasticsearch-ip-10-0-0-12] [derbysoft-ihg-20150304][2] stop throttling
indexing: numMergesInFlight=2, maxNumMerges=3
[2015-03-04 04:12:26,194][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [15709ms] ago, timed out [708ms] ago, action
[cluster:monitor/nodes/sta
ts[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70098828]
[2015-03-04 04:23:40,778][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [21030ms] ago, timed out [6030ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70124234]
[2015-03-04 04:24:47,023][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [27275ms] ago, timed out [12275ms] ago, action
[cluster:monitor/nodes/s
tats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70126273]
[2015-03-04 04:25:39,180][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [19431ms] ago, timed out [4431ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70127835]
[2015-03-04 04:26:40,775][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [19241ms] ago, timed out [4241ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70129981]
[2015-03-04 04:27:14,329][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [22676ms] ago, timed out [6688ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70130668]
[2015-03-04 04:28:15,695][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [24042ms] ago, timed out [9041ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70132644]
[2015-03-04 04:29:38,102][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [16448ms] ago, timed out [1448ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70135333]
[2015-03-04 04:33:42,393][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [20738ms] ago, timed out [5737ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70142427]
[2015-03-04 04:36:08,788][ERROR][marvel.agent ]
[elasticsearch-ip-10-0-0-12] Background thread had an uncaught exception:
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:698)
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:712)
at
org.apache.lucene.index.IndexWriter.ramBytesUsed(IndexWriter.java:462)
at
org.elasticsearch.index.engine.internal.InternalEngine.segmentsStats(InternalEngine.java:1224)
at
org.elasticsearch.index.shard.service.InternalIndexShard.segmentStats(InternalIndexShard.java:555)
at
org.elasticsearch.action.admin.indices.stats.CommonStats.(CommonStats.java:170)
at
org.elasticsearch.action.admin.indices.stats.ShardStats.(ShardStats.java:49)
at
org.elasticsearch.indices.InternalIndicesService.stats(InternalIndicesService.java:212)
at
org.elasticsearch.indices.InternalIndicesService.stats(InternalIndicesService.java:172)
at
org.elasticsearch.node.service.NodeService.stats(NodeService.java:138)
at
org.elasticsearch.marvel.agent.AgentService$ExportingWorker.exportNodeStats(AgentService.java:300)
at
org.elasticsearch.marvel.agent.AgentService$ExportingWorker.run(AgentService.java:225)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space

=====
On server 10.0.0.45 http://10.0.0.45:

[2015-03-04 04:43:27,245][WARN ][index.engine.internal ]
[elasticsearch-ip-10-0-0-45] [myindex-20150304][1] failed engine
[indices:data/write/bulk[s] failed on replica]
org.elasticsearch.index.engine.CreateFailedEngineException:
[myindex-20150304][1] Create failed for [my_type#AUvjGHoiku-fZf277h_4]
at
org.elasticsearch.index.engine.internal.InternalEngine.create(InternalEngine.java:421)
at
org.elasticsearch.index.shard.service.InternalIndexShard.create(InternalIndexShard.java:403)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnReplica(TransportShardBulkAction.java:595)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:246)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:225)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.lucene.store.AlreadyClosedException: this
IndexWriter is closed
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:698)
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:712)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1507)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1246)
at
org.elasticsearch.index.engine.internal.InternalEngine.innerCreateNoLock(InternalEngine.java:502)
at
org.elasticsearch.index.engine.internal.InternalEngine.innerCreate(InternalEngine.java:444)
at
org.elasticsearch.index.engine.internal.InternalEngine.create(InternalEngine.java:413)
... 8 more
Caused by: java.lang.OutOfMemoryError: Java heap space

=====

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3DphzaT3Np5TBW%2B-h_aOo9BScPu_5QO9qCqnYLp__JCjOPA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAND3DphzaT3Np5TBW%2B-h_aOo9BScPu_5QO9qCqnYLp__JCjOPA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_9uMAwF7nkZRnDvB9DAMmkSGrNG1HiWWvNgTRcg2TM8w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Chris_Neal · March 9, 2015, 2:24am

Thank you Mark for your reply.

I do have Marvel running, on a separate cluster even, so I do have that
data from the time of the problem. I've attached 4 screenshots for
reference.

It appears that node 10.0.0.12 (the green line on the charts) had issues.
The heap usage drops from 80% to 0%. I'm guessing that is some sort of
crash, because the heap should not empty itself. Also its load goes to 0.

I also see a lot of Old GC duration on 10.0.0.45 (blue line). Lots of
excessive Old GC Counts, so it does appear that the problem was memory
pressure on this node. That's what I was thinking, but was hoping for
validation on that.

If it was, I'm hoping to get some suggestions on what to do about it. As I
mentioned in the original post, I've tweaked I think needs tweaking based
on the system, and it still happens.

Maybe it's just that I'm pushing the cluster too much for the resources I'm
giving it, and it "just won't work".

The index rate was only about 2500/sec, and the search request rate had one
small spike that went to 3.0. But 3 searches in one timeslice is nothing.

Thanks again for the help and reading all this stuff. It is appreciated.
Hopefully I can get a solution to keep the cluster stable.

Chris

On Fri, Mar 6, 2015 at 3:01 PM, Mark Walkom markwalkom@gmail.com wrote:

You really need some kind of monitoring, like Marvel, around this to give
you an idea of what was happening prior to the OOM.
Generally a node becoming unresponsive will be due to GC, so take a look
at the timings there.

On 5 March 2015 at 02:32, Chris Neal chris.neal@derbysoft.net wrote:
Hi all,

I'm hoping someone can help me piece together the below log entries/stack
traces/Exceptions. I have a 3 node cluster in Development in EC2, and two
of them had issues. I'm running ES 1.4.4, 32GB RAM, 16GB heaps, dedicated
servers to ES. My idex rate averages about 10k/sec. There were no
searches going on at the time of the incident.

It appears to me that node 10.0.0.12 began timing out requests to
10.0.45, indicating that 10.0.0.45 was having issues.
Then at 4:36, 10.0.0.12 logs the ERROR about "Uncaught exception:
IndexWriter already closed", caused by an OOME.
Then at 4:43, 10.0.0.45 hits the "Create failed" WARN, and logs an OOME.
Then things are basically down and unresponsive.

What is weird to me is that if 10.0.0.45 was the node having issues, why
did 10.0.0.12 log an exception 7 minutes before that? Did both nodes run
out of memory? Or is one of the Exceptions actually saying, "I see that
this other node hit an OOME, and I'm telling you about it."

I have a few values tweaked in the elasticsearch.yml file to try and keep
this from happening (configured from Puppet):
'indices.breaker.fielddata.limit' => '20%',
'indices.breaker.total.limit' => '25%',
'indices.breaker.request.limit' => '10%',
'index.merge.scheduler.type' => 'concurrent',
'index.merge.scheduler.max_thread_count' => '1',
'index.merge.policy.type' => 'tiered',
'index.merge.policy.max_merged_segment' => '1gb',
'index.merge.policy.segments_per_tier' => '4',
'index.merge.policy.max_merge_at_once' => '4',
'index.merge.policy.max_merge_at_once_explicit' => '4',
'indices.memory.index_buffer_size' => '10%',
'indices.store.throttle.type' => 'none',
'index.translog.flush_threshold_size' => '1GB',
I have done a fair bit of reading on this, and have tried about
everything I can think of.

Can anyone tell me what caused this scenario, and what can be done to
avoid it?
Thank you so much for taking the time to read this.
Chris

=====
On server 10.0.0.12 http://10.0.0.12:

[2015-03-04 03:56:12,548][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [20456ms] ago, timed out [5392ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70061596]
[2015-03-04 04:06:02,407][INFO ][index.engine.internal ]
[elasticsearch-ip-10-0-0-12] [derbysoft-ihg-20150304][2] now throttling
indexing: numMergesInFlight=4, maxNumMerges=3
[2015-03-04 04:06:04,141][INFO ][index.engine.internal ]
[elasticsearch-ip-10-0-0-12] [derbysoft-ihg-20150304][2] stop throttling
indexing: numMergesInFlight=2, maxNumMerges=3
[2015-03-04 04:12:26,194][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [15709ms] ago, timed out [708ms] ago, action
[cluster:monitor/nodes/sta
ts[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70098828]
[2015-03-04 04:23:40,778][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [21030ms] ago, timed out [6030ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70124234]
[2015-03-04 04:24:47,023][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [27275ms] ago, timed out [12275ms] ago, action
[cluster:monitor/nodes/s
tats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70126273]
[2015-03-04 04:25:39,180][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [19431ms] ago, timed out [4431ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70127835]
[2015-03-04 04:26:40,775][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [19241ms] ago, timed out [4241ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70129981]
[2015-03-04 04:27:14,329][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [22676ms] ago, timed out [6688ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70130668]
[2015-03-04 04:28:15,695][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [24042ms] ago, timed out [9041ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70132644]
[2015-03-04 04:29:38,102][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [16448ms] ago, timed out [1448ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70135333]
[2015-03-04 04:33:42,393][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [20738ms] ago, timed out [5737ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70142427]
[2015-03-04 04:36:08,788][ERROR][marvel.agent ]
[elasticsearch-ip-10-0-0-12] Background thread had an uncaught exception:
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:698)
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:712)
at
org.apache.lucene.index.IndexWriter.ramBytesUsed(IndexWriter.java:462)
at
org.elasticsearch.index.engine.internal.InternalEngine.segmentsStats(InternalEngine.java:1224)
at
org.elasticsearch.index.shard.service.InternalIndexShard.segmentStats(InternalIndexShard.java:555)
at
org.elasticsearch.action.admin.indices.stats.CommonStats.(CommonStats.java:170)
at
org.elasticsearch.action.admin.indices.stats.ShardStats.(ShardStats.java:49)
at
org.elasticsearch.indices.InternalIndicesService.stats(InternalIndicesService.java:212)
at
org.elasticsearch.indices.InternalIndicesService.stats(InternalIndicesService.java:172)
at
org.elasticsearch.node.service.NodeService.stats(NodeService.java:138)
at
org.elasticsearch.marvel.agent.AgentService$ExportingWorker.exportNodeStats(AgentService.java:300)
at
org.elasticsearch.marvel.agent.AgentService$ExportingWorker.run(AgentService.java:225)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space

=====
On server 10.0.0.45 http://10.0.0.45:

[2015-03-04 04:43:27,245][WARN ][index.engine.internal ]
[elasticsearch-ip-10-0-0-45] [myindex-20150304][1] failed engine
[indices:data/write/bulk[s] failed on replica]
org.elasticsearch.index.engine.CreateFailedEngineException:
[myindex-20150304][1] Create failed for [my_type#AUvjGHoiku-fZf277h_4]
at
org.elasticsearch.index.engine.internal.InternalEngine.create(InternalEngine.java:421)
at
org.elasticsearch.index.shard.service.InternalIndexShard.create(InternalIndexShard.java:403)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnReplica(TransportShardBulkAction.java:595)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:246)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:225)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.lucene.store.AlreadyClosedException: this
IndexWriter is closed
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:698)
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:712)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1507)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1246)
at
org.elasticsearch.index.engine.internal.InternalEngine.innerCreateNoLock(InternalEngine.java:502)
at
org.elasticsearch.index.engine.internal.InternalEngine.innerCreate(InternalEngine.java:444)
at
org.elasticsearch.index.engine.internal.InternalEngine.create(InternalEngine.java:413)
... 8 more
Caused by: java.lang.OutOfMemoryError: Java heap space

=====

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3DphzaT3Np5TBW%2B-h_aOo9BScPu_5QO9qCqnYLp__JCjOPA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAND3DphzaT3Np5TBW%2B-h_aOo9BScPu_5QO9qCqnYLp__JCjOPA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_9uMAwF7nkZRnDvB9DAMmkSGrNG1HiWWvNgTRcg2TM8w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_9uMAwF7nkZRnDvB9DAMmkSGrNG1HiWWvNgTRcg2TM8w%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAND3Dpjr5ZUHKeWZROCJ6uCCjmEU3_geDuSdK96_-uqL6qGX2A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · March 11, 2015, 12:57am

It looks like heap pressure.
How many indices, how many shards, how much data do you have in the cluster?

On 8 March 2015 at 19:24, Chris Neal chris.neal@derbysoft.net wrote:

Thank you Mark for your reply.

I do have Marvel running, on a separate cluster even, so I do have that
data from the time of the problem. I've attached 4 screenshots for
reference.

It appears that node 10.0.0.12 (the green line on the charts) had issues.
The heap usage drops from 80% to 0%. I'm guessing that is some sort of
crash, because the heap should not empty itself. Also its load goes to 0.

I also see a lot of Old GC duration on 10.0.0.45 (blue line). Lots of
excessive Old GC Counts, so it does appear that the problem was memory
pressure on this node. That's what I was thinking, but was hoping for
validation on that.

If it was, I'm hoping to get some suggestions on what to do about it. As
I mentioned in the original post, I've tweaked I think needs tweaking based
on the system, and it still happens.

Maybe it's just that I'm pushing the cluster too much for the resources
I'm giving it, and it "just won't work".

The index rate was only about 2500/sec, and the search request rate had
one small spike that went to 3.0. But 3 searches in one timeslice is
nothing.

Thanks again for the help and reading all this stuff. It is appreciated.
Hopefully I can get a solution to keep the cluster stable.

Chris

On Fri, Mar 6, 2015 at 3:01 PM, Mark Walkom markwalkom@gmail.com wrote:
You really need some kind of monitoring, like Marvel, around this to give
you an idea of what was happening prior to the OOM.
Generally a node becoming unresponsive will be due to GC, so take a look
at the timings there.

On 5 March 2015 at 02:32, Chris Neal chris.neal@derbysoft.net wrote:
Hi all,

I'm hoping someone can help me piece together the below log
entries/stack traces/Exceptions. I have a 3 node cluster in Development in
EC2, and two of them had issues. I'm running ES 1.4.4, 32GB RAM, 16GB
heaps, dedicated servers to ES. My idex rate averages about 10k/sec.
There were no searches going on at the time of the incident.

It appears to me that node 10.0.0.12 began timing out requests to
10.0.45, indicating that 10.0.0.45 was having issues.
Then at 4:36, 10.0.0.12 logs the ERROR about "Uncaught exception:
IndexWriter already closed", caused by an OOME.
Then at 4:43, 10.0.0.45 hits the "Create failed" WARN, and logs an OOME.

Then things are basically down and unresponsive.

What is weird to me is that if 10.0.0.45 was the node having issues, why
did 10.0.0.12 log an exception 7 minutes before that? Did both nodes run
out of memory? Or is one of the Exceptions actually saying, "I see that
this other node hit an OOME, and I'm telling you about it."

I have a few values tweaked in the elasticsearch.yml file to try and
keep this from happening (configured from Puppet):
'indices.breaker.fielddata.limit' => '20%',
'indices.breaker.total.limit' => '25%',
'indices.breaker.request.limit' => '10%',
'index.merge.scheduler.type' => 'concurrent',
'index.merge.scheduler.max_thread_count' => '1',
'index.merge.policy.type' => 'tiered',
'index.merge.policy.max_merged_segment' => '1gb',
'index.merge.policy.segments_per_tier' => '4',
'index.merge.policy.max_merge_at_once' => '4',
'index.merge.policy.max_merge_at_once_explicit' => '4',
'indices.memory.index_buffer_size' => '10%',
'indices.store.throttle.type' => 'none',
'index.translog.flush_threshold_size' => '1GB',
I have done a fair bit of reading on this, and have tried about
everything I can think of.

Can anyone tell me what caused this scenario, and what can be done to
avoid it?
Thank you so much for taking the time to read this.
Chris

=====
On server 10.0.0.12 http://10.0.0.12:

[2015-03-04 03:56:12,548][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [20456ms] ago, timed out [5392ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70061596]
[2015-03-04 04:06:02,407][INFO ][index.engine.internal ]
[elasticsearch-ip-10-0-0-12] [derbysoft-ihg-20150304][2] now throttling
indexing: numMergesInFlight=4, maxNumMerges=3
[2015-03-04 04:06:04,141][INFO ][index.engine.internal ]
[elasticsearch-ip-10-0-0-12] [derbysoft-ihg-20150304][2] stop throttling
indexing: numMergesInFlight=2, maxNumMerges=3
[2015-03-04 04:12:26,194][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [15709ms] ago, timed out [708ms] ago, action
[cluster:monitor/nodes/sta
ts[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70098828]
[2015-03-04 04:23:40,778][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [21030ms] ago, timed out [6030ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70124234]
[2015-03-04 04:24:47,023][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [27275ms] ago, timed out [12275ms] ago, action
[cluster:monitor/nodes/s
tats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70126273]
[2015-03-04 04:25:39,180][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [19431ms] ago, timed out [4431ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70127835]
[2015-03-04 04:26:40,775][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [19241ms] ago, timed out [4241ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70129981]
[2015-03-04 04:27:14,329][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [22676ms] ago, timed out [6688ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70130668]
[2015-03-04 04:28:15,695][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [24042ms] ago, timed out [9041ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70132644]
[2015-03-04 04:29:38,102][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [16448ms] ago, timed out [1448ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70135333]
[2015-03-04 04:33:42,393][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [20738ms] ago, timed out [5737ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70142427]
[2015-03-04 04:36:08,788][ERROR][marvel.agent ]
[elasticsearch-ip-10-0-0-12] Background thread had an uncaught exception:
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is
closed
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:698)
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:712)
at
org.apache.lucene.index.IndexWriter.ramBytesUsed(IndexWriter.java:462)
at
org.elasticsearch.index.engine.internal.InternalEngine.segmentsStats(InternalEngine.java:1224)
at
org.elasticsearch.index.shard.service.InternalIndexShard.segmentStats(InternalIndexShard.java:555)
at
org.elasticsearch.action.admin.indices.stats.CommonStats.(CommonStats.java:170)
at
org.elasticsearch.action.admin.indices.stats.ShardStats.(ShardStats.java:49)
at
org.elasticsearch.indices.InternalIndicesService.stats(InternalIndicesService.java:212)
at
org.elasticsearch.indices.InternalIndicesService.stats(InternalIndicesService.java:172)
at
org.elasticsearch.node.service.NodeService.stats(NodeService.java:138)
at
org.elasticsearch.marvel.agent.AgentService$ExportingWorker.exportNodeStats(AgentService.java:300)
at
org.elasticsearch.marvel.agent.AgentService$ExportingWorker.run(AgentService.java:225)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space

=====
On server 10.0.0.45 http://10.0.0.45:

[2015-03-04 04:43:27,245][WARN ][index.engine.internal ]
[elasticsearch-ip-10-0-0-45] [myindex-20150304][1] failed engine
[indices:data/write/bulk[s] failed on replica]
org.elasticsearch.index.engine.CreateFailedEngineException:
[myindex-20150304][1] Create failed for [my_type#AUvjGHoiku-fZf277h_4]
at
org.elasticsearch.index.engine.internal.InternalEngine.create(InternalEngine.java:421)
at
org.elasticsearch.index.shard.service.InternalIndexShard.create(InternalIndexShard.java:403)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnReplica(TransportShardBulkAction.java:595)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:246)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:225)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.lucene.store.AlreadyClosedException: this
IndexWriter is closed
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:698)
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:712)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1507)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1246)
at
org.elasticsearch.index.engine.internal.InternalEngine.innerCreateNoLock(InternalEngine.java:502)
at
org.elasticsearch.index.engine.internal.InternalEngine.innerCreate(InternalEngine.java:444)
at
org.elasticsearch.index.engine.internal.InternalEngine.create(InternalEngine.java:413)
... 8 more
Caused by: java.lang.OutOfMemoryError: Java heap space

=====

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3DphzaT3Np5TBW%2B-h_aOo9BScPu_5QO9qCqnYLp__JCjOPA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAND3DphzaT3Np5TBW%2B-h_aOo9BScPu_5QO9qCqnYLp__JCjOPA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_9uMAwF7nkZRnDvB9DAMmkSGrNG1HiWWvNgTRcg2TM8w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_9uMAwF7nkZRnDvB9DAMmkSGrNG1HiWWvNgTRcg2TM8w%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3Dpjr5ZUHKeWZROCJ6uCCjmEU3_geDuSdK96_-uqL6qGX2A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAND3Dpjr5ZUHKeWZROCJ6uCCjmEU3_geDuSdK96_-uqL6qGX2A%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X88-hv1vp3xwJsz2kPex3tAND-rx%3DT-CEO1GXO0CkwSww%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Chris_Neal · March 11, 2015, 2:21pm

Again Mark, thank you for your time

157 Indicies
928 Shards
Daily indexing that adds 7 indexes per day
Each index has 3 shards and 1 replica
2.27TB of data in the cluster
Index rate averages about 1500/sec
IOps on the servers is ~40

Chris

On Tue, Mar 10, 2015 at 7:57 PM, Mark Walkom markwalkom@gmail.com wrote:

It looks like heap pressure.
How many indices, how many shards, how much data do you have in the
cluster?

On 8 March 2015 at 19:24, Chris Neal chris.neal@derbysoft.net wrote:
Thank you Mark for your reply.

I do have Marvel running, on a separate cluster even, so I do have that
data from the time of the problem. I've attached 4 screenshots for
reference.

It appears that node 10.0.0.12 (the green line on the charts) had
issues. The heap usage drops from 80% to 0%. I'm guessing that is some
sort of crash, because the heap should not empty itself. Also its load
goes to 0.

I also see a lot of Old GC duration on 10.0.0.45 (blue line). Lots of
excessive Old GC Counts, so it does appear that the problem was memory
pressure on this node. That's what I was thinking, but was hoping for
validation on that.

If it was, I'm hoping to get some suggestions on what to do about it. As
I mentioned in the original post, I've tweaked I think needs tweaking based
on the system, and it still happens.

Maybe it's just that I'm pushing the cluster too much for the resources
I'm giving it, and it "just won't work".

The index rate was only about 2500/sec, and the search request rate had
one small spike that went to 3.0. But 3 searches in one timeslice is
nothing.

Thanks again for the help and reading all this stuff. It is
appreciated. Hopefully I can get a solution to keep the cluster stable.

Chris

On Fri, Mar 6, 2015 at 3:01 PM, Mark Walkom markwalkom@gmail.com wrote:
You really need some kind of monitoring, like Marvel, around this to
give you an idea of what was happening prior to the OOM.
Generally a node becoming unresponsive will be due to GC, so take a look
at the timings there.

On 5 March 2015 at 02:32, Chris Neal chris.neal@derbysoft.net wrote:
Hi all,

I'm hoping someone can help me piece together the below log
entries/stack traces/Exceptions. I have a 3 node cluster in Development in
EC2, and two of them had issues. I'm running ES 1.4.4, 32GB RAM, 16GB
heaps, dedicated servers to ES. My idex rate averages about 10k/sec.
There were no searches going on at the time of the incident.

It appears to me that node 10.0.0.12 began timing out requests to
10.0.45, indicating that 10.0.0.45 was having issues.
Then at 4:36, 10.0.0.12 logs the ERROR about "Uncaught exception:
IndexWriter already closed", caused by an OOME.
Then at 4:43, 10.0.0.45 hits the "Create failed" WARN, and logs an
OOME.
Then things are basically down and unresponsive.

What is weird to me is that if 10.0.0.45 was the node having issues,
why did 10.0.0.12 log an exception 7 minutes before that? Did both nodes
run out of memory? Or is one of the Exceptions actually saying, "I see
that this other node hit an OOME, and I'm telling you about it."

I have a few values tweaked in the elasticsearch.yml file to try and
keep this from happening (configured from Puppet):
'indices.breaker.fielddata.limit' => '20%',
'indices.breaker.total.limit' => '25%',
'indices.breaker.request.limit' => '10%',
'index.merge.scheduler.type' => 'concurrent',
'index.merge.scheduler.max_thread_count' => '1',
'index.merge.policy.type' => 'tiered',
'index.merge.policy.max_merged_segment' => '1gb',
'index.merge.policy.segments_per_tier' => '4',
'index.merge.policy.max_merge_at_once' => '4',
'index.merge.policy.max_merge_at_once_explicit' => '4',
'indices.memory.index_buffer_size' => '10%',
'indices.store.throttle.type' => 'none',
'index.translog.flush_threshold_size' => '1GB',
I have done a fair bit of reading on this, and have tried about
everything I can think of.

Can anyone tell me what caused this scenario, and what can be done to
avoid it?
Thank you so much for taking the time to read this.
Chris

=====
On server 10.0.0.12 http://10.0.0.12:

[2015-03-04 03:56:12,548][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [20456ms] ago, timed out [5392ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70061596]
[2015-03-04 04:06:02,407][INFO ][index.engine.internal ]
[elasticsearch-ip-10-0-0-12] [derbysoft-ihg-20150304][2] now throttling
indexing: numMergesInFlight=4, maxNumMerges=3
[2015-03-04 04:06:04,141][INFO ][index.engine.internal ]
[elasticsearch-ip-10-0-0-12] [derbysoft-ihg-20150304][2] stop throttling
indexing: numMergesInFlight=2, maxNumMerges=3
[2015-03-04 04:12:26,194][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [15709ms] ago, timed out [708ms] ago, action
[cluster:monitor/nodes/sta
ts[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70098828]
[2015-03-04 04:23:40,778][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [21030ms] ago, timed out [6030ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70124234]
[2015-03-04 04:24:47,023][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [27275ms] ago, timed out [12275ms] ago, action
[cluster:monitor/nodes/s
tats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70126273]
[2015-03-04 04:25:39,180][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [19431ms] ago, timed out [4431ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70127835]
[2015-03-04 04:26:40,775][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [19241ms] ago, timed out [4241ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70129981]
[2015-03-04 04:27:14,329][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [22676ms] ago, timed out [6688ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70130668]
[2015-03-04 04:28:15,695][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [24042ms] ago, timed out [9041ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70132644]
[2015-03-04 04:29:38,102][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [16448ms] ago, timed out [1448ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70135333]
[2015-03-04 04:33:42,393][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [20738ms] ago, timed out [5737ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70142427]
[2015-03-04 04:36:08,788][ERROR][marvel.agent ]
[elasticsearch-ip-10-0-0-12] Background thread had an uncaught exception:
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is
closed
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:698)
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:712)
at
org.apache.lucene.index.IndexWriter.ramBytesUsed(IndexWriter.java:462)
at
org.elasticsearch.index.engine.internal.InternalEngine.segmentsStats(InternalEngine.java:1224)
at
org.elasticsearch.index.shard.service.InternalIndexShard.segmentStats(InternalIndexShard.java:555)
at
org.elasticsearch.action.admin.indices.stats.CommonStats.(CommonStats.java:170)
at
org.elasticsearch.action.admin.indices.stats.ShardStats.(ShardStats.java:49)
at
org.elasticsearch.indices.InternalIndicesService.stats(InternalIndicesService.java:212)
at
org.elasticsearch.indices.InternalIndicesService.stats(InternalIndicesService.java:172)
at
org.elasticsearch.node.service.NodeService.stats(NodeService.java:138)
at
org.elasticsearch.marvel.agent.AgentService$ExportingWorker.exportNodeStats(AgentService.java:300)
at
org.elasticsearch.marvel.agent.AgentService$ExportingWorker.run(AgentService.java:225)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space

=====
On server 10.0.0.45 http://10.0.0.45:

[2015-03-04 04:43:27,245][WARN ][index.engine.internal ]
[elasticsearch-ip-10-0-0-45] [myindex-20150304][1] failed engine
[indices:data/write/bulk[s] failed on replica]
org.elasticsearch.index.engine.CreateFailedEngineException:
[myindex-20150304][1] Create failed for [my_type#AUvjGHoiku-fZf277h_4]
at
org.elasticsearch.index.engine.internal.InternalEngine.create(InternalEngine.java:421)
at
org.elasticsearch.index.shard.service.InternalIndexShard.create(InternalIndexShard.java:403)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnReplica(TransportShardBulkAction.java:595)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:246)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:225)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.lucene.store.AlreadyClosedException: this
IndexWriter is closed
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:698)
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:712)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1507)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1246)
at
org.elasticsearch.index.engine.internal.InternalEngine.innerCreateNoLock(InternalEngine.java:502)
at
org.elasticsearch.index.engine.internal.InternalEngine.innerCreate(InternalEngine.java:444)
at
org.elasticsearch.index.engine.internal.InternalEngine.create(InternalEngine.java:413)
... 8 more
Caused by: java.lang.OutOfMemoryError: Java heap space

=====

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3DphzaT3Np5TBW%2B-h_aOo9BScPu_5QO9qCqnYLp__JCjOPA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAND3DphzaT3Np5TBW%2B-h_aOo9BScPu_5QO9qCqnYLp__JCjOPA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_9uMAwF7nkZRnDvB9DAMmkSGrNG1HiWWvNgTRcg2TM8w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_9uMAwF7nkZRnDvB9DAMmkSGrNG1HiWWvNgTRcg2TM8w%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3Dpjr5ZUHKeWZROCJ6uCCjmEU3_geDuSdK96_-uqL6qGX2A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAND3Dpjr5ZUHKeWZROCJ6uCCjmEU3_geDuSdK96_-uqL6qGX2A%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X88-hv1vp3xwJsz2kPex3tAND-rx%3DT-CEO1GXO0CkwSww%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X88-hv1vp3xwJsz2kPex3tAND-rx%3DT-CEO1GXO0CkwSww%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAND3Dph4U3pnL%3D1RYCT-ojJK3chd1goP%3DeRGbtd_pgmtP2oa5w%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · March 11, 2015, 7:13pm

Then you're definitely going to be seeing node pressure. I'd add another
one or two and see how things look after that.

On 11 March 2015 at 07:21, Chris Neal chris.neal@derbysoft.net wrote:

Again Mark, thank you for your time

157 Indicies
928 Shards
Daily indexing that adds 7 indexes per day
Each index has 3 shards and 1 replica
2.27TB of data in the cluster
Index rate averages about 1500/sec
IOps on the servers is ~40

Chris

On Tue, Mar 10, 2015 at 7:57 PM, Mark Walkom markwalkom@gmail.com wrote:
It looks like heap pressure.
How many indices, how many shards, how much data do you have in the
cluster?

On 8 March 2015 at 19:24, Chris Neal chris.neal@derbysoft.net wrote:
Thank you Mark for your reply.

I do have Marvel running, on a separate cluster even, so I do have that
data from the time of the problem. I've attached 4 screenshots for
reference.

It appears that node 10.0.0.12 (the green line on the charts) had
issues. The heap usage drops from 80% to 0%. I'm guessing that is some
sort of crash, because the heap should not empty itself. Also its load
goes to 0.

I also see a lot of Old GC duration on 10.0.0.45 (blue line). Lots of
excessive Old GC Counts, so it does appear that the problem was memory
pressure on this node. That's what I was thinking, but was hoping for
validation on that.

If it was, I'm hoping to get some suggestions on what to do about it.
As I mentioned in the original post, I've tweaked I think needs tweaking
based on the system, and it still happens.

Maybe it's just that I'm pushing the cluster too much for the resources
I'm giving it, and it "just won't work".

The index rate was only about 2500/sec, and the search request rate had
one small spike that went to 3.0. But 3 searches in one timeslice is
nothing.

Thanks again for the help and reading all this stuff. It is
appreciated. Hopefully I can get a solution to keep the cluster stable.

Chris

On Fri, Mar 6, 2015 at 3:01 PM, Mark Walkom markwalkom@gmail.com
wrote:
You really need some kind of monitoring, like Marvel, around this to
give you an idea of what was happening prior to the OOM.
Generally a node becoming unresponsive will be due to GC, so take a
look at the timings there.

On 5 March 2015 at 02:32, Chris Neal chris.neal@derbysoft.net wrote:
Hi all,

I'm hoping someone can help me piece together the below log
entries/stack traces/Exceptions. I have a 3 node cluster in Development in
EC2, and two of them had issues. I'm running ES 1.4.4, 32GB RAM, 16GB
heaps, dedicated servers to ES. My idex rate averages about 10k/sec.
There were no searches going on at the time of the incident.

It appears to me that node 10.0.0.12 began timing out requests to
10.0.45, indicating that 10.0.0.45 was having issues.
Then at 4:36, 10.0.0.12 logs the ERROR about "Uncaught exception:
IndexWriter already closed", caused by an OOME.
Then at 4:43, 10.0.0.45 hits the "Create failed" WARN, and logs an
OOME.
Then things are basically down and unresponsive.

What is weird to me is that if 10.0.0.45 was the node having issues,
why did 10.0.0.12 log an exception 7 minutes before that? Did both nodes
run out of memory? Or is one of the Exceptions actually saying, "I see
that this other node hit an OOME, and I'm telling you about it."

I have a few values tweaked in the elasticsearch.yml file to try and
keep this from happening (configured from Puppet):
'indices.breaker.fielddata.limit' => '20%',
'indices.breaker.total.limit' => '25%',
'indices.breaker.request.limit' => '10%',
'index.merge.scheduler.type' => 'concurrent',
'index.merge.scheduler.max_thread_count' => '1',
'index.merge.policy.type' => 'tiered',
'index.merge.policy.max_merged_segment' => '1gb',
'index.merge.policy.segments_per_tier' => '4',
'index.merge.policy.max_merge_at_once' => '4',
'index.merge.policy.max_merge_at_once_explicit' => '4',
'indices.memory.index_buffer_size' => '10%',
'indices.store.throttle.type' => 'none',
'index.translog.flush_threshold_size' => '1GB',
I have done a fair bit of reading on this, and have tried about
everything I can think of.

Can anyone tell me what caused this scenario, and what can be done to
avoid it?
Thank you so much for taking the time to read this.
Chris

=====
On server 10.0.0.12 http://10.0.0.12:

[2015-03-04 03:56:12,548][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [20456ms] ago, timed out [5392ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70061596]
[2015-03-04 04:06:02,407][INFO ][index.engine.internal ]
[elasticsearch-ip-10-0-0-12] [derbysoft-ihg-20150304][2] now throttling
indexing: numMergesInFlight=4, maxNumMerges=3
[2015-03-04 04:06:04,141][INFO ][index.engine.internal ]
[elasticsearch-ip-10-0-0-12] [derbysoft-ihg-20150304][2] stop throttling
indexing: numMergesInFlight=2, maxNumMerges=3
[2015-03-04 04:12:26,194][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [15709ms] ago, timed out [708ms] ago, action
[cluster:monitor/nodes/sta
ts[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70098828]
[2015-03-04 04:23:40,778][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [21030ms] ago, timed out [6030ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70124234]
[2015-03-04 04:24:47,023][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [27275ms] ago, timed out [12275ms] ago, action
[cluster:monitor/nodes/s
tats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70126273]
[2015-03-04 04:25:39,180][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [19431ms] ago, timed out [4431ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70127835]
[2015-03-04 04:26:40,775][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [19241ms] ago, timed out [4241ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70129981]
[2015-03-04 04:27:14,329][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [22676ms] ago, timed out [6688ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70130668]
[2015-03-04 04:28:15,695][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [24042ms] ago, timed out [9041ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70132644]
[2015-03-04 04:29:38,102][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [16448ms] ago, timed out [1448ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70135333]
[2015-03-04 04:33:42,393][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [20738ms] ago, timed out [5737ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70142427]
[2015-03-04 04:36:08,788][ERROR][marvel.agent ]
[elasticsearch-ip-10-0-0-12] Background thread had an uncaught exception:
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is
closed
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:698)
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:712)
at
org.apache.lucene.index.IndexWriter.ramBytesUsed(IndexWriter.java:462)
at
org.elasticsearch.index.engine.internal.InternalEngine.segmentsStats(InternalEngine.java:1224)
at
org.elasticsearch.index.shard.service.InternalIndexShard.segmentStats(InternalIndexShard.java:555)
at
org.elasticsearch.action.admin.indices.stats.CommonStats.(CommonStats.java:170)
at
org.elasticsearch.action.admin.indices.stats.ShardStats.(ShardStats.java:49)
at
org.elasticsearch.indices.InternalIndicesService.stats(InternalIndicesService.java:212)
at
org.elasticsearch.indices.InternalIndicesService.stats(InternalIndicesService.java:172)
at
org.elasticsearch.node.service.NodeService.stats(NodeService.java:138)
at
org.elasticsearch.marvel.agent.AgentService$ExportingWorker.exportNodeStats(AgentService.java:300)
at
org.elasticsearch.marvel.agent.AgentService$ExportingWorker.run(AgentService.java:225)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space

=====
On server 10.0.0.45 http://10.0.0.45:

[2015-03-04 04:43:27,245][WARN ][index.engine.internal ]
[elasticsearch-ip-10-0-0-45] [myindex-20150304][1] failed engine
[indices:data/write/bulk[s] failed on replica]
org.elasticsearch.index.engine.CreateFailedEngineException:
[myindex-20150304][1] Create failed for [my_type#AUvjGHoiku-fZf277h_4]
at
org.elasticsearch.index.engine.internal.InternalEngine.create(InternalEngine.java:421)
at
org.elasticsearch.index.shard.service.InternalIndexShard.create(InternalIndexShard.java:403)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnReplica(TransportShardBulkAction.java:595)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:246)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:225)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.lucene.store.AlreadyClosedException: this
IndexWriter is closed
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:698)
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:712)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1507)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1246)
at
org.elasticsearch.index.engine.internal.InternalEngine.innerCreateNoLock(InternalEngine.java:502)
at
org.elasticsearch.index.engine.internal.InternalEngine.innerCreate(InternalEngine.java:444)
at
org.elasticsearch.index.engine.internal.InternalEngine.create(InternalEngine.java:413)
... 8 more
Caused by: java.lang.OutOfMemoryError: Java heap space

=====

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3DphzaT3Np5TBW%2B-h_aOo9BScPu_5QO9qCqnYLp__JCjOPA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAND3DphzaT3Np5TBW%2B-h_aOo9BScPu_5QO9qCqnYLp__JCjOPA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_9uMAwF7nkZRnDvB9DAMmkSGrNG1HiWWvNgTRcg2TM8w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_9uMAwF7nkZRnDvB9DAMmkSGrNG1HiWWvNgTRcg2TM8w%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3Dpjr5ZUHKeWZROCJ6uCCjmEU3_geDuSdK96_-uqL6qGX2A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAND3Dpjr5ZUHKeWZROCJ6uCCjmEU3_geDuSdK96_-uqL6qGX2A%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X88-hv1vp3xwJsz2kPex3tAND-rx%3DT-CEO1GXO0CkwSww%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X88-hv1vp3xwJsz2kPex3tAND-rx%3DT-CEO1GXO0CkwSww%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3Dph4U3pnL%3D1RYCT-ojJK3chd1goP%3DeRGbtd_pgmtP2oa5w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAND3Dph4U3pnL%3D1RYCT-ojJK3chd1goP%3DeRGbtd_pgmtP2oa5w%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9SMb2DeTvkWA2OogQH%2BijSKHP%2B40ZYt-OXnCm10QgYJQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Chris_Neal · March 12, 2015, 3:09pm

Thank you Mark.

May I ask what about my answers caused you to say "definitely"? I want
to better understand capacity related items for ES for sure.

Many thanks!
Chris

On Wed, Mar 11, 2015 at 2:13 PM, Mark Walkom markwalkom@gmail.com wrote:

Then you're definitely going to be seeing node pressure. I'd add another
one or two and see how things look after that.

On 11 March 2015 at 07:21, Chris Neal chris.neal@derbysoft.net wrote:
Again Mark, thank you for your time

157 Indicies
928 Shards
Daily indexing that adds 7 indexes per day
Each index has 3 shards and 1 replica
2.27TB of data in the cluster
Index rate averages about 1500/sec
IOps on the servers is ~40

Chris

On Tue, Mar 10, 2015 at 7:57 PM, Mark Walkom markwalkom@gmail.com
wrote:
It looks like heap pressure.
How many indices, how many shards, how much data do you have in the
cluster?

On 8 March 2015 at 19:24, Chris Neal chris.neal@derbysoft.net wrote:
Thank you Mark for your reply.

I do have Marvel running, on a separate cluster even, so I do have that
data from the time of the problem. I've attached 4 screenshots for
reference.

It appears that node 10.0.0.12 (the green line on the charts) had
issues. The heap usage drops from 80% to 0%. I'm guessing that is some
sort of crash, because the heap should not empty itself. Also its load
goes to 0.

I also see a lot of Old GC duration on 10.0.0.45 (blue line). Lots of
excessive Old GC Counts, so it does appear that the problem was memory
pressure on this node. That's what I was thinking, but was hoping for
validation on that.

If it was, I'm hoping to get some suggestions on what to do about it.
As I mentioned in the original post, I've tweaked I think needs tweaking
based on the system, and it still happens.

Maybe it's just that I'm pushing the cluster too much for the resources
I'm giving it, and it "just won't work".

The index rate was only about 2500/sec, and the search request rate had
one small spike that went to 3.0. But 3 searches in one timeslice is
nothing.

Thanks again for the help and reading all this stuff. It is
appreciated. Hopefully I can get a solution to keep the cluster stable.

Chris

On Fri, Mar 6, 2015 at 3:01 PM, Mark Walkom markwalkom@gmail.com
wrote:
You really need some kind of monitoring, like Marvel, around this to
give you an idea of what was happening prior to the OOM.
Generally a node becoming unresponsive will be due to GC, so take a
look at the timings there.

On 5 March 2015 at 02:32, Chris Neal chris.neal@derbysoft.net wrote:
Hi all,

I'm hoping someone can help me piece together the below log
entries/stack traces/Exceptions. I have a 3 node cluster in Development in
EC2, and two of them had issues. I'm running ES 1.4.4, 32GB RAM, 16GB
heaps, dedicated servers to ES. My idex rate averages about 10k/sec.
There were no searches going on at the time of the incident.

It appears to me that node 10.0.0.12 began timing out requests to
10.0.45, indicating that 10.0.0.45 was having issues.
Then at 4:36, 10.0.0.12 logs the ERROR about "Uncaught exception:
IndexWriter already closed", caused by an OOME.
Then at 4:43, 10.0.0.45 hits the "Create failed" WARN, and logs an
OOME.
Then things are basically down and unresponsive.

What is weird to me is that if 10.0.0.45 was the node having issues,
why did 10.0.0.12 log an exception 7 minutes before that? Did both nodes
run out of memory? Or is one of the Exceptions actually saying, "I see
that this other node hit an OOME, and I'm telling you about it."

I have a few values tweaked in the elasticsearch.yml file to try and
keep this from happening (configured from Puppet):
'indices.breaker.fielddata.limit' => '20%',
'indices.breaker.total.limit' => '25%',
'indices.breaker.request.limit' => '10%',
'index.merge.scheduler.type' => 'concurrent',
'index.merge.scheduler.max_thread_count' => '1',
'index.merge.policy.type' => 'tiered',
'index.merge.policy.max_merged_segment' => '1gb',
'index.merge.policy.segments_per_tier' => '4',
'index.merge.policy.max_merge_at_once' => '4',
'index.merge.policy.max_merge_at_once_explicit' => '4',
'indices.memory.index_buffer_size' => '10%',
'indices.store.throttle.type' => 'none',
'index.translog.flush_threshold_size' => '1GB',
I have done a fair bit of reading on this, and have tried about
everything I can think of.

Can anyone tell me what caused this scenario, and what can be done to
avoid it?
Thank you so much for taking the time to read this.
Chris

=====
On server 10.0.0.12 http://10.0.0.12:

[2015-03-04 03:56:12,548][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [20456ms] ago, timed out [5392ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70061596]
[2015-03-04 04:06:02,407][INFO ][index.engine.internal ]
[elasticsearch-ip-10-0-0-12] [derbysoft-ihg-20150304][2] now throttling
indexing: numMergesInFlight=4, maxNumMerges=3
[2015-03-04 04:06:04,141][INFO ][index.engine.internal ]
[elasticsearch-ip-10-0-0-12] [derbysoft-ihg-20150304][2] stop throttling
indexing: numMergesInFlight=2, maxNumMerges=3
[2015-03-04 04:12:26,194][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [15709ms] ago, timed out [708ms] ago, action
[cluster:monitor/nodes/sta
ts[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70098828]
[2015-03-04 04:23:40,778][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [21030ms] ago, timed out [6030ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70124234]
[2015-03-04 04:24:47,023][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [27275ms] ago, timed out [12275ms] ago, action
[cluster:monitor/nodes/s
tats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70126273]
[2015-03-04 04:25:39,180][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [19431ms] ago, timed out [4431ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70127835]
[2015-03-04 04:26:40,775][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [19241ms] ago, timed out [4241ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70129981]
[2015-03-04 04:27:14,329][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [22676ms] ago, timed out [6688ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70130668]
[2015-03-04 04:28:15,695][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [24042ms] ago, timed out [9041ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70132644]
[2015-03-04 04:29:38,102][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [16448ms] ago, timed out [1448ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70135333]
[2015-03-04 04:33:42,393][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [20738ms] ago, timed out [5737ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70142427]
[2015-03-04 04:36:08,788][ERROR][marvel.agent ]
[elasticsearch-ip-10-0-0-12] Background thread had an uncaught exception:
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is
closed
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:698)
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:712)
at
org.apache.lucene.index.IndexWriter.ramBytesUsed(IndexWriter.java:462)
at
org.elasticsearch.index.engine.internal.InternalEngine.segmentsStats(InternalEngine.java:1224)
at
org.elasticsearch.index.shard.service.InternalIndexShard.segmentStats(InternalIndexShard.java:555)
at
org.elasticsearch.action.admin.indices.stats.CommonStats.(CommonStats.java:170)
at
org.elasticsearch.action.admin.indices.stats.ShardStats.(ShardStats.java:49)
at
org.elasticsearch.indices.InternalIndicesService.stats(InternalIndicesService.java:212)
at
org.elasticsearch.indices.InternalIndicesService.stats(InternalIndicesService.java:172)
at
org.elasticsearch.node.service.NodeService.stats(NodeService.java:138)
at
org.elasticsearch.marvel.agent.AgentService$ExportingWorker.exportNodeStats(AgentService.java:300)
at
org.elasticsearch.marvel.agent.AgentService$ExportingWorker.run(AgentService.java:225)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space

=====
On server 10.0.0.45 http://10.0.0.45:

[2015-03-04 04:43:27,245][WARN ][index.engine.internal ]
[elasticsearch-ip-10-0-0-45] [myindex-20150304][1] failed engine
[indices:data/write/bulk[s] failed on replica]
org.elasticsearch.index.engine.CreateFailedEngineException:
[myindex-20150304][1] Create failed for [my_type#AUvjGHoiku-fZf277h_4]
at
org.elasticsearch.index.engine.internal.InternalEngine.create(InternalEngine.java:421)
at
org.elasticsearch.index.shard.service.InternalIndexShard.create(InternalIndexShard.java:403)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnReplica(TransportShardBulkAction.java:595)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:246)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:225)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.lucene.store.AlreadyClosedException: this
IndexWriter is closed
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:698)
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:712)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1507)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1246)
at
org.elasticsearch.index.engine.internal.InternalEngine.innerCreateNoLock(InternalEngine.java:502)
at
org.elasticsearch.index.engine.internal.InternalEngine.innerCreate(InternalEngine.java:444)
at
org.elasticsearch.index.engine.internal.InternalEngine.create(InternalEngine.java:413)
... 8 more
Caused by: java.lang.OutOfMemoryError: Java heap space

=====

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3DphzaT3Np5TBW%2B-h_aOo9BScPu_5QO9qCqnYLp__JCjOPA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAND3DphzaT3Np5TBW%2B-h_aOo9BScPu_5QO9qCqnYLp__JCjOPA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_9uMAwF7nkZRnDvB9DAMmkSGrNG1HiWWvNgTRcg2TM8w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_9uMAwF7nkZRnDvB9DAMmkSGrNG1HiWWvNgTRcg2TM8w%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3Dpjr5ZUHKeWZROCJ6uCCjmEU3_geDuSdK96_-uqL6qGX2A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAND3Dpjr5ZUHKeWZROCJ6uCCjmEU3_geDuSdK96_-uqL6qGX2A%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X88-hv1vp3xwJsz2kPex3tAND-rx%3DT-CEO1GXO0CkwSww%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X88-hv1vp3xwJsz2kPex3tAND-rx%3DT-CEO1GXO0CkwSww%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3Dph4U3pnL%3D1RYCT-ojJK3chd1goP%3DeRGbtd_pgmtP2oa5w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAND3Dph4U3pnL%3D1RYCT-ojJK3chd1goP%3DeRGbtd_pgmtP2oa5w%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9SMb2DeTvkWA2OogQH%2BijSKHP%2B40ZYt-OXnCm10QgYJQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9SMb2DeTvkWA2OogQH%2BijSKHP%2B40ZYt-OXnCm10QgYJQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAND3Dphh0Awaswu2_0JEreWOZ5Xcc%3DS4E5LpMxrOt284SXLzzA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

warkolm · March 13, 2015, 4:08am

The limit of a node is hard to definitively know as use cases vary so much,
but from what I have seen 3TB on 3 nodes is pretty dense.

On 12 March 2015 at 08:09, Chris Neal chris.neal@derbysoft.net wrote:

Thank you Mark.

May I ask what about my answers caused you to say "definitely"? I want
to better understand capacity related items for ES for sure.

Many thanks!
Chris

On Wed, Mar 11, 2015 at 2:13 PM, Mark Walkom markwalkom@gmail.com wrote:
Then you're definitely going to be seeing node pressure. I'd add another
one or two and see how things look after that.

On 11 March 2015 at 07:21, Chris Neal chris.neal@derbysoft.net wrote:
Again Mark, thank you for your time

157 Indicies
928 Shards
Daily indexing that adds 7 indexes per day
Each index has 3 shards and 1 replica
2.27TB of data in the cluster
Index rate averages about 1500/sec
IOps on the servers is ~40

Chris

On Tue, Mar 10, 2015 at 7:57 PM, Mark Walkom markwalkom@gmail.com
wrote:
It looks like heap pressure.
How many indices, how many shards, how much data do you have in the
cluster?

On 8 March 2015 at 19:24, Chris Neal chris.neal@derbysoft.net wrote:
Thank you Mark for your reply.

I do have Marvel running, on a separate cluster even, so I do have
that data from the time of the problem. I've attached 4 screenshots for
reference.

It appears that node 10.0.0.12 (the green line on the charts) had
issues. The heap usage drops from 80% to 0%. I'm guessing that is some
sort of crash, because the heap should not empty itself. Also its load
goes to 0.

I also see a lot of Old GC duration on 10.0.0.45 (blue line). Lots of
excessive Old GC Counts, so it does appear that the problem was memory
pressure on this node. That's what I was thinking, but was hoping for
validation on that.

If it was, I'm hoping to get some suggestions on what to do about it.
As I mentioned in the original post, I've tweaked I think needs tweaking
based on the system, and it still happens.

Maybe it's just that I'm pushing the cluster too much for the
resources I'm giving it, and it "just won't work".

The index rate was only about 2500/sec, and the search request rate
had one small spike that went to 3.0. But 3 searches in one timeslice is
nothing.

Thanks again for the help and reading all this stuff. It is
appreciated. Hopefully I can get a solution to keep the cluster stable.

Chris

On Fri, Mar 6, 2015 at 3:01 PM, Mark Walkom markwalkom@gmail.com
wrote:
You really need some kind of monitoring, like Marvel, around this to
give you an idea of what was happening prior to the OOM.
Generally a node becoming unresponsive will be due to GC, so take a
look at the timings there.

On 5 March 2015 at 02:32, Chris Neal chris.neal@derbysoft.net
wrote:
Hi all,

I'm hoping someone can help me piece together the below log
entries/stack traces/Exceptions. I have a 3 node cluster in Development in
EC2, and two of them had issues. I'm running ES 1.4.4, 32GB RAM, 16GB
heaps, dedicated servers to ES. My idex rate averages about 10k/sec.
There were no searches going on at the time of the incident.

It appears to me that node 10.0.0.12 began timing out requests to
10.0.45, indicating that 10.0.0.45 was having issues.
Then at 4:36, 10.0.0.12 logs the ERROR about "Uncaught exception:
IndexWriter already closed", caused by an OOME.
Then at 4:43, 10.0.0.45 hits the "Create failed" WARN, and logs an
OOME.
Then things are basically down and unresponsive.

What is weird to me is that if 10.0.0.45 was the node having issues,
why did 10.0.0.12 log an exception 7 minutes before that? Did both nodes
run out of memory? Or is one of the Exceptions actually saying, "I see
that this other node hit an OOME, and I'm telling you about it."

I have a few values tweaked in the elasticsearch.yml file to try and
keep this from happening (configured from Puppet):
'indices.breaker.fielddata.limit' => '20%',
'indices.breaker.total.limit' => '25%',
'indices.breaker.request.limit' => '10%',
'index.merge.scheduler.type' => 'concurrent',
'index.merge.scheduler.max_thread_count' => '1',
'index.merge.policy.type' => 'tiered',
'index.merge.policy.max_merged_segment' => '1gb',
'index.merge.policy.segments_per_tier' => '4',
'index.merge.policy.max_merge_at_once' => '4',
'index.merge.policy.max_merge_at_once_explicit' => '4',
'indices.memory.index_buffer_size' => '10%',
'indices.store.throttle.type' => 'none',
'index.translog.flush_threshold_size' => '1GB',
I have done a fair bit of reading on this, and have tried about
everything I can think of.

Can anyone tell me what caused this scenario, and what can be done
to avoid it?
Thank you so much for taking the time to read this.
Chris

=====
On server 10.0.0.12 http://10.0.0.12:

[2015-03-04 03:56:12,548][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [20456ms] ago, timed out [5392ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70061596]
[2015-03-04 04:06:02,407][INFO ][index.engine.internal ]
[elasticsearch-ip-10-0-0-12] [derbysoft-ihg-20150304][2] now throttling
indexing: numMergesInFlight=4, maxNumMerges=3
[2015-03-04 04:06:04,141][INFO ][index.engine.internal ]
[elasticsearch-ip-10-0-0-12] [derbysoft-ihg-20150304][2] stop throttling
indexing: numMergesInFlight=2, maxNumMerges=3
[2015-03-04 04:12:26,194][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [15709ms] ago, timed out [708ms] ago, action
[cluster:monitor/nodes/sta
ts[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70098828]
[2015-03-04 04:23:40,778][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [21030ms] ago, timed out [6030ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70124234]
[2015-03-04 04:24:47,023][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [27275ms] ago, timed out [12275ms] ago, action
[cluster:monitor/nodes/s
tats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70126273]
[2015-03-04 04:25:39,180][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [19431ms] ago, timed out [4431ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70127835]
[2015-03-04 04:26:40,775][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [19241ms] ago, timed out [4241ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70129981]
[2015-03-04 04:27:14,329][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [22676ms] ago, timed out [6688ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70130668]
[2015-03-04 04:28:15,695][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [24042ms] ago, timed out [9041ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70132644]
[2015-03-04 04:29:38,102][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [16448ms] ago, timed out [1448ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70135333]
[2015-03-04 04:33:42,393][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [20738ms] ago, timed out [5737ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70142427]
[2015-03-04 04:36:08,788][ERROR][marvel.agent ]
[elasticsearch-ip-10-0-0-12] Background thread had an uncaught exception:
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is
closed
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:698)
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:712)
at
org.apache.lucene.index.IndexWriter.ramBytesUsed(IndexWriter.java:462)
at
org.elasticsearch.index.engine.internal.InternalEngine.segmentsStats(InternalEngine.java:1224)
at
org.elasticsearch.index.shard.service.InternalIndexShard.segmentStats(InternalIndexShard.java:555)
at
org.elasticsearch.action.admin.indices.stats.CommonStats.(CommonStats.java:170)
at
org.elasticsearch.action.admin.indices.stats.ShardStats.(ShardStats.java:49)
at
org.elasticsearch.indices.InternalIndicesService.stats(InternalIndicesService.java:212)
at
org.elasticsearch.indices.InternalIndicesService.stats(InternalIndicesService.java:172)
at
org.elasticsearch.node.service.NodeService.stats(NodeService.java:138)
at
org.elasticsearch.marvel.agent.AgentService$ExportingWorker.exportNodeStats(AgentService.java:300)
at
org.elasticsearch.marvel.agent.AgentService$ExportingWorker.run(AgentService.java:225)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space

=====
On server 10.0.0.45 http://10.0.0.45:

[2015-03-04 04:43:27,245][WARN ][index.engine.internal ]
[elasticsearch-ip-10-0-0-45] [myindex-20150304][1] failed engine
[indices:data/write/bulk[s] failed on replica]
org.elasticsearch.index.engine.CreateFailedEngineException:
[myindex-20150304][1] Create failed for [my_type#AUvjGHoiku-fZf277h_4]
at
org.elasticsearch.index.engine.internal.InternalEngine.create(InternalEngine.java:421)
at
org.elasticsearch.index.shard.service.InternalIndexShard.create(InternalIndexShard.java:403)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnReplica(TransportShardBulkAction.java:595)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:246)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:225)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.lucene.store.AlreadyClosedException: this
IndexWriter is closed
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:698)
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:712)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1507)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1246)
at
org.elasticsearch.index.engine.internal.InternalEngine.innerCreateNoLock(InternalEngine.java:502)
at
org.elasticsearch.index.engine.internal.InternalEngine.innerCreate(InternalEngine.java:444)
at
org.elasticsearch.index.engine.internal.InternalEngine.create(InternalEngine.java:413)
... 8 more
Caused by: java.lang.OutOfMemoryError: Java heap space

=====

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3DphzaT3Np5TBW%2B-h_aOo9BScPu_5QO9qCqnYLp__JCjOPA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAND3DphzaT3Np5TBW%2B-h_aOo9BScPu_5QO9qCqnYLp__JCjOPA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_9uMAwF7nkZRnDvB9DAMmkSGrNG1HiWWvNgTRcg2TM8w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_9uMAwF7nkZRnDvB9DAMmkSGrNG1HiWWvNgTRcg2TM8w%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3Dpjr5ZUHKeWZROCJ6uCCjmEU3_geDuSdK96_-uqL6qGX2A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAND3Dpjr5ZUHKeWZROCJ6uCCjmEU3_geDuSdK96_-uqL6qGX2A%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X88-hv1vp3xwJsz2kPex3tAND-rx%3DT-CEO1GXO0CkwSww%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X88-hv1vp3xwJsz2kPex3tAND-rx%3DT-CEO1GXO0CkwSww%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3Dph4U3pnL%3D1RYCT-ojJK3chd1goP%3DeRGbtd_pgmtP2oa5w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAND3Dph4U3pnL%3D1RYCT-ojJK3chd1goP%3DeRGbtd_pgmtP2oa5w%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9SMb2DeTvkWA2OogQH%2BijSKHP%2B40ZYt-OXnCm10QgYJQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9SMb2DeTvkWA2OogQH%2BijSKHP%2B40ZYt-OXnCm10QgYJQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3Dphh0Awaswu2_0JEreWOZ5Xcc%3DS4E5LpMxrOt284SXLzzA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAND3Dphh0Awaswu2_0JEreWOZ5Xcc%3DS4E5LpMxrOt284SXLzzA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-HWqDAF%2B8E8s%3DxU19tQgPpK12kX1DFwYiwus%2B9_kbD-g%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Chris_Neal · March 13, 2015, 2:57pm

Fair enough.

Thanks for all the replies!
Chris

On Thu, Mar 12, 2015 at 11:08 PM, Mark Walkom markwalkom@gmail.com wrote:

The limit of a node is hard to definitively know as use cases vary so
much, but from what I have seen 3TB on 3 nodes is pretty dense.

On 12 March 2015 at 08:09, Chris Neal chris.neal@derbysoft.net wrote:
Thank you Mark.

May I ask what about my answers caused you to say "definitely"? I
want to better understand capacity related items for ES for sure.

Many thanks!
Chris

On Wed, Mar 11, 2015 at 2:13 PM, Mark Walkom markwalkom@gmail.com
wrote:
Then you're definitely going to be seeing node pressure. I'd add another
one or two and see how things look after that.

On 11 March 2015 at 07:21, Chris Neal chris.neal@derbysoft.net wrote:
Again Mark, thank you for your time

157 Indicies
928 Shards
Daily indexing that adds 7 indexes per day
Each index has 3 shards and 1 replica
2.27TB of data in the cluster
Index rate averages about 1500/sec
IOps on the servers is ~40

Chris

On Tue, Mar 10, 2015 at 7:57 PM, Mark Walkom markwalkom@gmail.com
wrote:
It looks like heap pressure.
How many indices, how many shards, how much data do you have in the
cluster?

On 8 March 2015 at 19:24, Chris Neal chris.neal@derbysoft.net wrote:
Thank you Mark for your reply.

I do have Marvel running, on a separate cluster even, so I do have
that data from the time of the problem. I've attached 4 screenshots for
reference.

It appears that node 10.0.0.12 (the green line on the charts) had
issues. The heap usage drops from 80% to 0%. I'm guessing that is some
sort of crash, because the heap should not empty itself. Also its load
goes to 0.

I also see a lot of Old GC duration on 10.0.0.45 (blue line). Lots
of excessive Old GC Counts, so it does appear that the problem was memory
pressure on this node. That's what I was thinking, but was hoping for
validation on that.

If it was, I'm hoping to get some suggestions on what to do about
it. As I mentioned in the original post, I've tweaked I think needs
tweaking based on the system, and it still happens.

Maybe it's just that I'm pushing the cluster too much for the
resources I'm giving it, and it "just won't work".

The index rate was only about 2500/sec, and the search request rate
had one small spike that went to 3.0. But 3 searches in one timeslice is
nothing.

Thanks again for the help and reading all this stuff. It is
appreciated. Hopefully I can get a solution to keep the cluster stable.

Chris

On Fri, Mar 6, 2015 at 3:01 PM, Mark Walkom markwalkom@gmail.com
wrote:
You really need some kind of monitoring, like Marvel, around this to
give you an idea of what was happening prior to the OOM.
Generally a node becoming unresponsive will be due to GC, so take a
look at the timings there.

On 5 March 2015 at 02:32, Chris Neal chris.neal@derbysoft.net
wrote:
Hi all,

I'm hoping someone can help me piece together the below log
entries/stack traces/Exceptions. I have a 3 node cluster in Development in
EC2, and two of them had issues. I'm running ES 1.4.4, 32GB RAM, 16GB
heaps, dedicated servers to ES. My idex rate averages about 10k/sec.
There were no searches going on at the time of the incident.

It appears to me that node 10.0.0.12 began timing out requests to
10.0.45, indicating that 10.0.0.45 was having issues.
Then at 4:36, 10.0.0.12 logs the ERROR about "Uncaught exception:
IndexWriter already closed", caused by an OOME.
Then at 4:43, 10.0.0.45 hits the "Create failed" WARN, and logs an
OOME.
Then things are basically down and unresponsive.

What is weird to me is that if 10.0.0.45 was the node having
issues, why did 10.0.0.12 log an exception 7 minutes before that? Did both
nodes run out of memory? Or is one of the Exceptions actually saying, "I
see that this other node hit an OOME, and I'm telling you about it."

I have a few values tweaked in the elasticsearch.yml file to try
and keep this from happening (configured from Puppet):
'indices.breaker.fielddata.limit' => '20%',
'indices.breaker.total.limit' => '25%',
'indices.breaker.request.limit' => '10%',
'index.merge.scheduler.type' => 'concurrent',
'index.merge.scheduler.max_thread_count' => '1',
'index.merge.policy.type' => 'tiered',
'index.merge.policy.max_merged_segment' => '1gb',
'index.merge.policy.segments_per_tier' => '4',
'index.merge.policy.max_merge_at_once' => '4',
'index.merge.policy.max_merge_at_once_explicit' => '4',
'indices.memory.index_buffer_size' => '10%',
'indices.store.throttle.type' => 'none',
'index.translog.flush_threshold_size' => '1GB',
I have done a fair bit of reading on this, and have tried about
everything I can think of.

Can anyone tell me what caused this scenario, and what can be done
to avoid it?
Thank you so much for taking the time to read this.
Chris

=====
On server 10.0.0.12 http://10.0.0.12:

[2015-03-04 03:56:12,548][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [20456ms] ago, timed out [5392ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70061596]
[2015-03-04 04:06:02,407][INFO ][index.engine.internal ]
[elasticsearch-ip-10-0-0-12] [derbysoft-ihg-20150304][2] now throttling
indexing: numMergesInFlight=4, maxNumMerges=3
[2015-03-04 04:06:04,141][INFO ][index.engine.internal ]
[elasticsearch-ip-10-0-0-12] [derbysoft-ihg-20150304][2] stop throttling
indexing: numMergesInFlight=2, maxNumMerges=3
[2015-03-04 04:12:26,194][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [15709ms] ago, timed out [708ms] ago, action
[cluster:monitor/nodes/sta
ts[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70098828]
[2015-03-04 04:23:40,778][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [21030ms] ago, timed out [6030ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70124234]
[2015-03-04 04:24:47,023][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [27275ms] ago, timed out [12275ms] ago, action
[cluster:monitor/nodes/s
tats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70126273]
[2015-03-04 04:25:39,180][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [19431ms] ago, timed out [4431ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70127835]
[2015-03-04 04:26:40,775][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [19241ms] ago, timed out [4241ms] ago, action
[cluster:monitor/nodes/st
ats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70129981]
[2015-03-04 04:27:14,329][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [22676ms] ago, timed out [6688ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70130668]
[2015-03-04 04:28:15,695][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [24042ms] ago, timed out [9041ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70132644]
[2015-03-04 04:29:38,102][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [16448ms] ago, timed out [1448ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70135333]
[2015-03-04 04:33:42,393][WARN ][transport ]
[elasticsearch-ip-10-0-0-12] Received response for a request that has timed
out, sent [20738ms] ago, timed out [5737ms] ago, action
[cluster:monitor/nodes/stats[n]], node
[[elasticsearch-ip-10-0-0-45][i4gmsxs0Q0eyvPWjajNV5A][ip-10-0-0-45.us-west-2.compute.internal][inet[ip-10-0-0-45.us-west-2.compute.internal/10.0.0.45:9300]]{master=true}],
id [70142427]
[2015-03-04 04:36:08,788][ERROR][marvel.agent ]
[elasticsearch-ip-10-0-0-12] Background thread had an uncaught exception:
org.apache.lucene.store.AlreadyClosedException: this IndexWriter is
closed
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:698)
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:712)
at
org.apache.lucene.index.IndexWriter.ramBytesUsed(IndexWriter.java:462)
at
org.elasticsearch.index.engine.internal.InternalEngine.segmentsStats(InternalEngine.java:1224)
at
org.elasticsearch.index.shard.service.InternalIndexShard.segmentStats(InternalIndexShard.java:555)
at
org.elasticsearch.action.admin.indices.stats.CommonStats.(CommonStats.java:170)
at
org.elasticsearch.action.admin.indices.stats.ShardStats.(ShardStats.java:49)
at
org.elasticsearch.indices.InternalIndicesService.stats(InternalIndicesService.java:212)
at
org.elasticsearch.indices.InternalIndicesService.stats(InternalIndicesService.java:172)
at
org.elasticsearch.node.service.NodeService.stats(NodeService.java:138)
at
org.elasticsearch.marvel.agent.AgentService$ExportingWorker.exportNodeStats(AgentService.java:300)
at
org.elasticsearch.marvel.agent.AgentService$ExportingWorker.run(AgentService.java:225)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space

=====
On server 10.0.0.45 http://10.0.0.45:

[2015-03-04 04:43:27,245][WARN ][index.engine.internal ]
[elasticsearch-ip-10-0-0-45] [myindex-20150304][1] failed engine
[indices:data/write/bulk[s] failed on replica]
org.elasticsearch.index.engine.CreateFailedEngineException:
[myindex-20150304][1] Create failed for [my_type#AUvjGHoiku-fZf277h_4]
at
org.elasticsearch.index.engine.internal.InternalEngine.create(InternalEngine.java:421)
at
org.elasticsearch.index.shard.service.InternalIndexShard.create(InternalIndexShard.java:403)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnReplica(TransportShardBulkAction.java:595)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:246)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$ReplicaOperationTransportHandler.messageReceived(TransportShardReplicationOperationAction.java:225)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:275)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.lucene.store.AlreadyClosedException: this
IndexWriter is closed
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:698)
at
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:712)
at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1507)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1246)
at
org.elasticsearch.index.engine.internal.InternalEngine.innerCreateNoLock(InternalEngine.java:502)
at
org.elasticsearch.index.engine.internal.InternalEngine.innerCreate(InternalEngine.java:444)
at
org.elasticsearch.index.engine.internal.InternalEngine.create(InternalEngine.java:413)
... 8 more
Caused by: java.lang.OutOfMemoryError: Java heap space

=====

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3DphzaT3Np5TBW%2B-h_aOo9BScPu_5QO9qCqnYLp__JCjOPA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAND3DphzaT3Np5TBW%2B-h_aOo9BScPu_5QO9qCqnYLp__JCjOPA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_9uMAwF7nkZRnDvB9DAMmkSGrNG1HiWWvNgTRcg2TM8w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X_9uMAwF7nkZRnDvB9DAMmkSGrNG1HiWWvNgTRcg2TM8w%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3Dpjr5ZUHKeWZROCJ6uCCjmEU3_geDuSdK96_-uqL6qGX2A%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAND3Dpjr5ZUHKeWZROCJ6uCCjmEU3_geDuSdK96_-uqL6qGX2A%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X88-hv1vp3xwJsz2kPex3tAND-rx%3DT-CEO1GXO0CkwSww%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X88-hv1vp3xwJsz2kPex3tAND-rx%3DT-CEO1GXO0CkwSww%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3Dph4U3pnL%3D1RYCT-ojJK3chd1goP%3DeRGbtd_pgmtP2oa5w%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAND3Dph4U3pnL%3D1RYCT-ojJK3chd1goP%3DeRGbtd_pgmtP2oa5w%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9SMb2DeTvkWA2OogQH%2BijSKHP%2B40ZYt-OXnCm10QgYJQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9SMb2DeTvkWA2OogQH%2BijSKHP%2B40ZYt-OXnCm10QgYJQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAND3Dphh0Awaswu2_0JEreWOZ5Xcc%3DS4E5LpMxrOt284SXLzzA%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAND3Dphh0Awaswu2_0JEreWOZ5Xcc%3DS4E5LpMxrOt284SXLzzA%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-HWqDAF%2B8E8s%3DxU19tQgPpK12kX1DFwYiwus%2B9_kbD-g%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEYi1X-HWqDAF%2B8E8s%3DxU19tQgPpK12kX1DFwYiwus%2B9_kbD-g%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAND3DphZQL8f8YPLX8M6QU5j3mCQZYzQFuD5xLRM6PEP3_pRCw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Cascading cluster failure Elasticsearch	13	512	July 6, 2017
Lack of memory? Elasticsearch	11	805	July 6, 2017
Elasticsearch dies every other day Elasticsearch	15	1642	July 6, 2017
Node experiencing relatively high CPU usage Elasticsearch	27	4165	July 6, 2017
Startup issues with ES 1.3.5 Elasticsearch	22	1016	July 6, 2017

Please help to understand these Exceptions

Related topics