Out of memory, missing shards, looks like split-brain


(Quan Tong Anh) #1

I'm running a 3-node cluster with 2 data nodes. My configuration:

es1, es2:

node:
name: elasticsearch-1
master: true
data: true

discovery:
zen: ping:
multicast:
enabled: false
unicast:
hosts: ["elasticsearch-1.domain.com:9300","logs.domain.com:9300",
"elasticsearch-2.domain.com:9300",]

gl2:

node:
name: graylog2
master: false
data: false

Shinken has sent me a notification that said there is only 2 nodes in
cluster:

{
"cluster_name" : "domain.com",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 1,
"active_primary_shards" : 12,
"active_shards" : 12,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 12
}

Log on the ES-1:

[2014-06-04 15:51:09,281][WARN ][transport ] [elasticsearch-1
] Received response for a request that has timed out, sent [61627ms] ago,
timed ou
t [30338ms] ago, action [discovery/zen/fd/masterPing], node [[elasticsearch-
2][Vcvb6dtMQf-nfuB-wR9iew][inet[/107.170.x.y:9300]]{master=true}], id [
272380]
[2014-06-04 15:51:50,542][WARN ][index.cache.field.data.resident] [
elasticsearch-1] [graylog2-graylog2_2] loading field [_date ] caused out of
memory failure
java.lang.OutOfMemoryError: Java heap space
[2014-06-04 15:55:16,351][DEBUG][action.admin.indices.stats] [elasticsearch-
1] [graylog2-graylog2_5][2], node[Vcvb6dtMQf-nfuB-wR9iew], [P], s[STARTED]:
Failed
to execute [org.elasticsearch.action.admin.indices.stats.
IndicesStatsRequest@7631d2a2]
org.elasticsearch.transport.RemoteTransportException: [elasticsearch-2][inet
[/107.170.x.y:9300]][indices/stats/s]
Caused by: org.elasticsearch.index.IndexShardMissingException: [graylog2-
graylog2_5][2] missing
at org.elasticsearch.index.service.InternalIndexService.shardSafe(
InternalIndexService.java:179)
at org.elasticsearch.action.admin.indices.stats.
TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:
145)
at org.elasticsearch.action.admin.indices.stats.
TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:
53)
at org.elasticsearch.action.support.broadcast.
TransportBroadcastOperationAction$ShardTransportHandler.messageReceived(
TransportBroadcastOperationAction.java:398)
at org.elasticsearch.action.support.broadcast.
TransportBroadcastOperationAction$ShardTransportHandler.messageReceived(
TransportBroadcastOperationAction.java:384)
at org.elasticsearch.transport.netty.
MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:268)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
[2014-06-04 15:56:29,504][WARN ][index.engine.robin ] [elasticsearch-1
] [graylog2_recent][0] failed engine
java.lang.OutOfMemoryError: Java heap space

Log on the ES-2:

[2014-06-04 15:51:02,276][WARN ][transport.netty ] [elasticsearch-2
] exception caught on transport layer [[id: 0x72906b9d, /107.170.z.t:52899
=> /1
07.170.x.y:9300]], closing connection
java.lang.OutOfMemoryError: Java heap space
at java.nio.DirectByteBuffer.duplicate(DirectByteBuffer.java:217)
at org.elasticsearch.common.netty.channel.socket.nio.
SocketSendBufferPool.acquire(SocketSendBufferPool.java:87)
at org.elasticsearch.common.netty.channel.socket.nio.
SocketSendBufferPool.acquire(SocketSendBufferPool.java:46)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioWorker.write0(AbstractNioWorker.java:190)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:150)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioSelector.run(AbstractNioSelector.java:290)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioWorker.run(AbstractNioWorker.java:88)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(
NioWorker.java:178)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(
ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.
DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

[2014-06-04 15:51:27,143][WARN ][indices.cluster ] [elasticsearch-2
] [graylog2-graylog2_5][2] master [[elasticsearch-2][Vcvb6dtMQf-nfuB-wR9iew
][inet[/107.170.x.y:9300]]{master=true}] marked shard as started, but shard
have not been created, mark shard as failed

Log on the GL2:

Jun 4 15:51:11 graylog2 graylog2-server: 2014-06-04 15:51:11,040 WARN : org
.graylog2.buffers.processors.OutputBufferProcessor - Timeout reached. Not
waiting
any longer for writer threads to complete.
Jun 4 15:51:14 graylog2 graylog2-server: 2014-06-04 15:51:14,694 WARN : org
.elasticsearch.discovery.zen - [graylog2] master_left and no other node
elected to become master, current nodes: {[graylog2][hHcLLZ2GTamMajmE-a5lXg
][inet[/107.170.z.t:9300]]{client=true, data=false, master=false},}
Jun 4 15:51:14 graylog2 graylog2-server: 2014-06-04 15:51:14,708 ERROR: org
.graylog2.periodical.DeflectorManagerThread - Tried to check for number of
messages in current deflector target but did not find index. Aborting.
Jun 4 15:51:14 graylog2 graylog2-server: org.elasticsearch.cluster.block.
ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not
recovered / initialized];[SERVICE_UNAVAILABLE/2/no master];
Jun 4 15:51:14 graylog2 graylog2-server: 2014-06-04 15:51:14,709 ERROR: org
.graylog2.periodical.DeflectorManagerThread - Couldn't delete outdated or
empty indices
Jun 4 15:52:57 graylog2 graylog2-server: 2014-06-04 15:52:57,339 ERROR:
org.graylog2.indexer.EmbeddedElasticSearchClient - Could not read name of
ES node.
Jun 4 15:52:57 graylog2 graylog2-server: java.lang.NullPointerException
Jun 4 15:52:57 graylog2 graylog2-server: at
org.graylog2.indexer.EmbeddedElasticSearchClient.nodeIdToName(EmbeddedElasticSearchClient.java:135)
Jun 4 15:52:57 graylog2 graylog2-server: at
org.graylog2.indexer.DeflectorInformation.getShardInformation(DeflectorInformation.java:125)
Jun 4 15:52:57 graylog2 graylog2-server: at
org.graylog2.indexer.DeflectorInformation.getIndexInformation(DeflectorInformation.java:110)
Jun 4 15:52:57 graylog2 graylog2-server: at
org.graylog2.indexer.DeflectorInformation.getAsDatabaseObject(DeflectorInformation.java:84)
Jun 4 15:52:57 graylog2 graylog2-server: at
org.graylog2.periodical.DeflectorInformationWriterThread.run(DeflectorInformationWriterThread.java:72)
Jun 4 15:52:57 graylog2 graylog2-server: at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
Jun 4 15:52:57 graylog2 graylog2-server: at
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
Jun 4 15:52:57 graylog2 graylog2-server: at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
Jun 4 15:52:57 graylog2 graylog2-server: at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
Jun 4 15:52:57 graylog2 graylog2-server: at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
Jun 4 15:52:57 graylog2 graylog2-server: at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
Jun 4 15:52:57 graylog2 graylog2-server: at
java.lang.Thread.run(Thread.java:744)

Please let me know if you need further information.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b8044e30-2246-4246-b9ce-291644ef0021%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #2

What do you want to know exactly?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 5 June 2014 12:40, Quan Tong Anh tonganhquan.net@gmail.com wrote:

I'm running a 3-node cluster with 2 data nodes. My configuration:

es1, es2:

node:
name: elasticsearch-1
master: true
data: true

discovery:
zen: ping:
multicast:
enabled: false
unicast:
hosts: ["elasticsearch-1.domain.com:9300","logs.domain.com:9300","
elasticsearch-2.domain.com:9300",]

gl2:

node:
name: graylog2
master: false
data: false

Shinken has sent me a notification that said there is only 2 nodes in
cluster:

{
"cluster_name" : "domain.com",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 1,
"active_primary_shards" : 12,
"active_shards" : 12,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 12
}

Log on the ES-1:

[2014-06-04 15:51:09,281][WARN ][transport ] [elasticsearch
-1] Received response for a request that has timed out, sent [61627ms] ago
, timed ou
t [30338ms] ago, action [discovery/zen/fd/masterPing], node [[
elasticsearch-2][Vcvb6dtMQf-nfuB-wR9iew][inet[/107.170.x.y:9300]]{master=
true}], id [272380]
[2014-06-04 15:51:50,542][WARN ][index.cache.field.data.resident] [
elasticsearch-1] [graylog2-graylog2_2] loading field [_date ] caused out
of memory failure
java.lang.OutOfMemoryError: Java heap space
[2014-06-04 15:55:16,351][DEBUG][action.admin.indices.stats] [
elasticsearch-1] [graylog2-graylog2_5][2], node[Vcvb6dtMQf-nfuB-wR9iew], [
P], s[STARTED]: Failed
to execute [org.elasticsearch.action.admin.indices.stats.
IndicesStatsRequest@7631d2a2]
org.elasticsearch.transport.RemoteTransportException: [elasticsearch-2][
inet[/107.170.x.y:9300]][indices/stats/s]
Caused by: org.elasticsearch.index.IndexShardMissingException: [graylog2-
graylog2_5][2] missing
at org.elasticsearch.index.service.InternalIndexService.shardSafe(
InternalIndexService.java:179)
at org.elasticsearch.action.admin.indices.stats.
TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.
java:145)
at org.elasticsearch.action.admin.indices.stats.
TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.
java:53)
at org.elasticsearch.action.support.broadcast.
TransportBroadcastOperationAction$ShardTransportHandler.messageReceived(
TransportBroadcastOperationAction.java:398)
at org.elasticsearch.action.support.broadcast.
TransportBroadcastOperationAction$ShardTransportHandler.messageReceived(
TransportBroadcastOperationAction.java:384)
at org.elasticsearch.transport.netty.
MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:268)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
[2014-06-04 15:56:29,504][WARN ][index.engine.robin ] [elasticsearch
-1] [graylog2_recent][0] failed engine
java.lang.OutOfMemoryError: Java heap space

Log on the ES-2:

[2014-06-04 15:51:02,276][WARN ][transport.netty ] [elasticsearch
-2] exception caught on transport layer [[id: 0x72906b9d, /107.170.z.t:52899
=> /1
07.170.x.y:9300]], closing connection
java.lang.OutOfMemoryError: Java heap space
at java.nio.DirectByteBuffer.duplicate(DirectByteBuffer.java:217)
at org.elasticsearch.common.netty.channel.socket.nio.
SocketSendBufferPool.acquire(SocketSendBufferPool.java:87)
at org.elasticsearch.common.netty.channel.socket.nio.
SocketSendBufferPool.acquire(SocketSendBufferPool.java:46)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioWorker.write0(AbstractNioWorker.java:190)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:150)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioSelector.run(AbstractNioSelector.java:290)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioWorker.run(AbstractNioWorker.java:88)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run
(NioWorker.java:178)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(
ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.
DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

[2014-06-04 15:51:27,143][WARN ][indices.cluster ] [elasticsearch
-2] [graylog2-graylog2_5][2] master [[elasticsearch-2][Vcvb6dtMQf-nfuB-
wR9iew][inet[/107.170.x.y:9300]]{master=true}] marked shard as started,
but shard have not been created, mark shard as failed

Log on the GL2:

Jun 4 15:51:11 graylog2 graylog2-server: 2014-06-04 15:51:11,040 WARN :
org.graylog2.buffers.processors.OutputBufferProcessor - Timeout reached.
Not waiting
any longer for writer threads to complete.
Jun 4 15:51:14 graylog2 graylog2-server: 2014-06-04 15:51:14,694 WARN :
org.elasticsearch.discovery.zen - [graylog2] master_left and no other
node elected to become master, current nodes: {[graylog2][hHcLLZ2GTamMajmE
-a5lXg][inet[/107.170.z.t:9300]]{client=true, data=false, master=false},}
Jun 4 15:51:14 graylog2 graylog2-server: 2014-06-04 15:51:14,708 ERROR:
org.graylog2.periodical.DeflectorManagerThread - Tried to check for
number of messages in current deflector target but did not find index.
Aborting.
Jun 4 15:51:14 graylog2 graylog2-server: org.elasticsearch.cluster.block.
ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not
recovered / initialized];[SERVICE_UNAVAILABLE/2/no master];
Jun 4 15:51:14 graylog2 graylog2-server: 2014-06-04 15:51:14,709 ERROR:
org.graylog2.periodical.DeflectorManagerThread - Couldn't delete outdated
or empty indices
Jun 4 15:52:57 graylog2 graylog2-server: 2014-06-04 15:52:57,339 ERROR:
org.graylog2.indexer.EmbeddedElasticSearchClient - Could not read name of
ES node.
Jun 4 15:52:57 graylog2 graylog2-server: java.lang.NullPointerException
Jun 4 15:52:57 graylog2 graylog2-server: at
org.graylog2.indexer.EmbeddedElasticSearchClient.nodeIdToName(EmbeddedElasticSearchClient.java:135)
Jun 4 15:52:57 graylog2 graylog2-server: at
org.graylog2.indexer.DeflectorInformation.getShardInformation(DeflectorInformation.java:125)
Jun 4 15:52:57 graylog2 graylog2-server: at
org.graylog2.indexer.DeflectorInformation.getIndexInformation(DeflectorInformation.java:110)
Jun 4 15:52:57 graylog2 graylog2-server: at
org.graylog2.indexer.DeflectorInformation.getAsDatabaseObject(DeflectorInformation.java:84)
Jun 4 15:52:57 graylog2 graylog2-server: at
org.graylog2.periodical.DeflectorInformationWriterThread.run(DeflectorInformationWriterThread.java:72)
Jun 4 15:52:57 graylog2 graylog2-server: at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
Jun 4 15:52:57 graylog2 graylog2-server: at
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
Jun 4 15:52:57 graylog2 graylog2-server: at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
Jun 4 15:52:57 graylog2 graylog2-server: at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
Jun 4 15:52:57 graylog2 graylog2-server: at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
Jun 4 15:52:57 graylog2 graylog2-server: at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
Jun 4 15:52:57 graylog2 graylog2-server: at
java.lang.Thread.run(Thread.java:744)

Please let me know if you need further information.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b8044e30-2246-4246-b9ce-291644ef0021%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b8044e30-2246-4246-b9ce-291644ef0021%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624Y7mgLV-eDwSbyA4SFhYOmAHmyCgzoXrrtvpsGoOhRaeQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Quan Tong Anh) #3

I would like to know:
- What is the root cause?
- How do I fix that?
- If it's memory problem? Is there anything that I can do (except for upgrade)?

On Jun 5, 2014, at 9:54 AM, Mark Walkom markw@campaignmonitor.com wrote:

What do you want to know exactly?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 5 June 2014 12:40, Quan Tong Anh tonganhquan.net@gmail.com wrote:
I'm running a 3-node cluster with 2 data nodes. My configuration:

es1, es2:

node:
name: elasticsearch-1
master: true
data: true

discovery:
zen: ping:
multicast:
enabled: false
unicast:
hosts: ["elasticsearch-1.domain.com:9300","logs.domain.com:9300","elasticsearch-2.domain.com:9300",]

gl2:

node:
name: graylog2
master: false
data: false

Shinken has sent me a notification that said there is only 2 nodes in cluster:

{
"cluster_name" : "domain.com",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 1,
"active_primary_shards" : 12,
"active_shards" : 12,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 12
}

Log on the ES-1:

[2014-06-04 15:51:09,281][WARN ][transport ] [elasticsearch-1] Received response for a request that has timed out, sent [61627ms] ago, timed ou
t [30338ms] ago, action [discovery/zen/fd/masterPing], node [[elasticsearch-2][Vcvb6dtMQf-nfuB-wR9iew][inet[/107.170.x.y:9300]]{master=true}], id [272380]
[2014-06-04 15:51:50,542][WARN ][index.cache.field.data.resident] [elasticsearch-1] [graylog2-graylog2_2] loading field [_date ] caused out of memory failure
java.lang.OutOfMemoryError: Java heap space
[2014-06-04 15:55:16,351][DEBUG][action.admin.indices.stats] [elasticsearch-1] [graylog2-graylog2_5][2], node[Vcvb6dtMQf-nfuB-wR9iew], [P], s[STARTED]: Failed
to execute [org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@7631d2a2]
org.elasticsearch.transport.RemoteTransportException: [elasticsearch-2][inet[/107.170.x.y:9300]][indices/stats/s]
Caused by: org.elasticsearch.index.IndexShardMissingException: [graylog2-graylog2_5][2] missing
at org.elasticsearch.index.service.InternalIndexService.shardSafe(InternalIndexService.java:179)
at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:145)
at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:53)
at org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$ShardTransportHandler.messageReceived(TransportBroadcastOperationAction.java:398)
at org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$ShardTransportHandler.messageReceived(TransportBroadcastOperationAction.java:384)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:268)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
[2014-06-04 15:56:29,504][WARN ][index.engine.robin ] [elasticsearch-1] [graylog2_recent][0] failed engine
java.lang.OutOfMemoryError: Java heap space

Log on the ES-2:

[2014-06-04 15:51:02,276][WARN ][transport.netty ] [elasticsearch-2] exception caught on transport layer [[id: 0x72906b9d, /107.170.z.t:52899 => /1
07.170.x.y:9300]], closing connection
java.lang.OutOfMemoryError: Java heap space
at java.nio.DirectByteBuffer.duplicate(DirectByteBuffer.java:217)
at org.elasticsearch.common.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:87)
at org.elasticsearch.common.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:190)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:150)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

[2014-06-04 15:51:27,143][WARN ][indices.cluster ] [elasticsearch-2] [graylog2-graylog2_5][2] master [[elasticsearch-2][Vcvb6dtMQf-nfuB-wR9iew][inet[/107.170.x.y:9300]]{master=true}] marked shard as started, but shard have not been created, mark shard as failed

Log on the GL2:

Jun 4 15:51:11 graylog2 graylog2-server: 2014-06-04 15:51:11,040 WARN : org.graylog2.buffers.processors.OutputBufferProcessor - Timeout reached. Not waiting
any longer for writer threads to complete.
Jun 4 15:51:14 graylog2 graylog2-server: 2014-06-04 15:51:14,694 WARN : org.elasticsearch.discovery.zen - [graylog2] master_left and no other node elected to become master, current nodes: {[graylog2][hHcLLZ2GTamMajmE-a5lXg][inet[/107.170.z.t:9300]]{client=true, data=false, master=false},}
Jun 4 15:51:14 graylog2 graylog2-server: 2014-06-04 15:51:14,708 ERROR: org.graylog2.periodical.DeflectorManagerThread - Tried to check for number of messages in current deflector target but did not find index. Aborting.
Jun 4 15:51:14 graylog2 graylog2-server: org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized];[SERVICE_UNAVAILABLE/2/no master];
Jun 4 15:51:14 graylog2 graylog2-server: 2014-06-04 15:51:14,709 ERROR: org.graylog2.periodical.DeflectorManagerThread - Couldn't delete outdated or empty indices
Jun 4 15:52:57 graylog2 graylog2-server: 2014-06-04 15:52:57,339 ERROR: org.graylog2.indexer.EmbeddedElasticSearchClient - Could not read name of ES node.
Jun 4 15:52:57 graylog2 graylog2-server: java.lang.NullPointerException
Jun 4 15:52:57 graylog2 graylog2-server: at org.graylog2.indexer.EmbeddedElasticSearchClient.nodeIdToName(EmbeddedElasticSearchClient.java:135)
Jun 4 15:52:57 graylog2 graylog2-server: at org.graylog2.indexer.DeflectorInformation.getShardInformation(DeflectorInformation.java:125)
Jun 4 15:52:57 graylog2 graylog2-server: at org.graylog2.indexer.DeflectorInformation.getIndexInformation(DeflectorInformation.java:110)
Jun 4 15:52:57 graylog2 graylog2-server: at org.graylog2.indexer.DeflectorInformation.getAsDatabaseObject(DeflectorInformation.java:84)
Jun 4 15:52:57 graylog2 graylog2-server: at org.graylog2.periodical.DeflectorInformationWriterThread.run(DeflectorInformationWriterThread.java:72)
Jun 4 15:52:57 graylog2 graylog2-server: at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
Jun 4 15:52:57 graylog2 graylog2-server: at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
Jun 4 15:52:57 graylog2 graylog2-server: at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
Jun 4 15:52:57 graylog2 graylog2-server: at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
Jun 4 15:52:57 graylog2 graylog2-server: at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
Jun 4 15:52:57 graylog2 graylog2-server: at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
Jun 4 15:52:57 graylog2 graylog2-server: at java.lang.Thread.run(Thread.java:744)

Please let me know if you need further information.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b8044e30-2246-4246-b9ce-291644ef0021%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/GhKnPvx1rHw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624Y7mgLV-eDwSbyA4SFhYOmAHmyCgzoXrrtvpsGoOhRaeQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/81C7B576-CC9A-4F4B-9F70-FF7CD6F0E0CC%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #4
  1. Depends
  2. See 1
  3. Add more nodes, more RAM or reduce your data set

For 1 and 2, you'll have to provide more info on your cluster setup, size
and use.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 5 June 2014 13:00, Quan Tong Anh tonganhquan.net@gmail.com wrote:

I would like to know:

  • What is the root cause?
  • How do I fix that?
  • If it’s memory problem? Is there anything that I can do (except for
    upgrade)?

On Jun 5, 2014, at 9:54 AM, Mark Walkom markw@campaignmonitor.com wrote:

What do you want to know exactly?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 5 June 2014 12:40, Quan Tong Anh tonganhquan.net@gmail.com wrote:

I'm running a 3-node cluster with 2 data nodes. My configuration:

es1, es2:

node:
name: elasticsearch-1
master: true
data: true

discovery:
zen: ping:
multicast:
enabled: false
unicast:
hosts: ["elasticsearch-1.domain.com:9300","logs.domain.com:9300",
"elasticsearch-2.domain.com:9300",]

gl2:

node:
name: graylog2
master: false
data: false

Shinken has sent me a notification that said there is only 2 nodes in
cluster:

{
"cluster_name" : "domain.com",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 1,
"active_primary_shards" : 12,
"active_shards" : 12,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 12
}

Log on the ES-1:

[2014-06-04 15:51:09,281][WARN ][transport ] [
elasticsearch-1] Received response for a request that has timed out,
sent [61627ms] ago, timed ou
t [30338ms] ago, action [discovery/zen/fd/masterPing], node [[
elasticsearch-2][Vcvb6dtMQf-nfuB-wR9iew][inet[/107.170.x.y:9300]]{master=
true}], id [272380]
[2014-06-04 15:51:50,542][WARN ][index.cache.field.data.resident] [
elasticsearch-1] [graylog2-graylog2_2] loading field [_date ] caused out
of memory failure
java.lang.OutOfMemoryError: Java heap space
[2014-06-04 15:55:16,351][DEBUG][action.admin.indices.stats] [
elasticsearch-1] [graylog2-graylog2_5][2], node[Vcvb6dtMQf-nfuB-wR9iew],
[P], s[STARTED]: Failed
to execute [org.elasticsearch.action.admin.indices.stats.
IndicesStatsRequest@7631d2a2]
org.elasticsearch.transport.RemoteTransportException: [elasticsearch-2][
inet[/107.170.x.y:9300]][indices/stats/s]
Caused by: org.elasticsearch.index.IndexShardMissingException: [graylog2-
graylog2_5][2] missing
at org.elasticsearch.index.service.InternalIndexService.shardSafe
(InternalIndexService.java:179)
at org.elasticsearch.action.admin.indices.stats.
TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.
java:145)
at org.elasticsearch.action.admin.indices.stats.
TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.
java:53)
at org.elasticsearch.action.support.broadcast.
TransportBroadcastOperationAction$ShardTransportHandler.messageReceived(
TransportBroadcastOperationAction.java:398)
at org.elasticsearch.action.support.broadcast.
TransportBroadcastOperationAction$ShardTransportHandler.messageReceived(
TransportBroadcastOperationAction.java:384)
at org.elasticsearch.transport.netty.
MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:268)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
[2014-06-04 15:56:29,504][WARN ][index.engine.robin ] [
elasticsearch-1] [graylog2_recent][0] failed engine
java.lang.OutOfMemoryError: Java heap space

Log on the ES-2:

[2014-06-04 15:51:02,276][WARN ][transport.netty ] [
elasticsearch-2] exception caught on transport layer [[id: 0x72906b9d, /107.170.z.t:52899
=> /1
07.170.x.y:9300]], closing connection
java.lang.OutOfMemoryError: Java heap space
at java.nio.DirectByteBuffer.duplicate(DirectByteBuffer.java:217)
at org.elasticsearch.common.netty.channel.socket.nio.
SocketSendBufferPool.acquire(SocketSendBufferPool.java:87)
at org.elasticsearch.common.netty.channel.socket.nio.
SocketSendBufferPool.acquire(SocketSendBufferPool.java:46)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioWorker.write0(AbstractNioWorker.java:190)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:150)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioSelector.run(AbstractNioSelector.java:290)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioWorker.run(AbstractNioWorker.java:88)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.
run(NioWorker.java:178)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run
(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.
DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

[2014-06-04 15:51:27,143][WARN ][indices.cluster ] [
elasticsearch-2] [graylog2-graylog2_5][2] master [[elasticsearch-2][
Vcvb6dtMQf-nfuB-wR9iew][inet[/107.170.x.y:9300]]{master=true}] marked
shard as started, but shard have not been created, mark shard as failed

Log on the GL2:

Jun 4 15:51:11 graylog2 graylog2-server: 2014-06-04 15:51:11,040 WARN :
org.graylog2.buffers.processors.OutputBufferProcessor - Timeout reached.
Not waiting
any longer for writer threads to complete.
Jun 4 15:51:14 graylog2 graylog2-server: 2014-06-04 15:51:14,694 WARN :
org.elasticsearch.discovery.zen - [graylog2] master_left and no other
node elected to become master, current nodes: {[graylog2][
hHcLLZ2GTamMajmE-a5lXg][inet[/107.170.z.t:9300]]{client=true, data=false,
master=false},}
Jun 4 15:51:14 graylog2 graylog2-server: 2014-06-04 15:51:14,708 ERROR:
org.graylog2.periodical.DeflectorManagerThread - Tried to check for
number of messages in current deflector target but did not find index.
Aborting.
Jun 4 15:51:14 graylog2 graylog2-server: org.elasticsearch.cluster.block
.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not
recovered / initialized];[SERVICE_UNAVAILABLE/2/no master];
Jun 4 15:51:14 graylog2 graylog2-server: 2014-06-04 15:51:14,709 ERROR:
org.graylog2.periodical.DeflectorManagerThread - Couldn't delete
outdated or empty indices
Jun 4 15:52:57 graylog2 graylog2-server: 2014-06-04 15:52:57,339 ERROR:
org.graylog2.indexer.EmbeddedElasticSearchClient - Could not read name of
ES node.
Jun 4 15:52:57 graylog2 graylog2-server: java.lang.NullPointerException
Jun 4 15:52:57 graylog2 graylog2-server: at
org.graylog2.indexer.EmbeddedElasticSearchClient.nodeIdToName(EmbeddedElasticSearchClient.java:135)
Jun 4 15:52:57 graylog2 graylog2-server: at
org.graylog2.indexer.DeflectorInformation.getShardInformation(DeflectorInformation.java:125)
Jun 4 15:52:57 graylog2 graylog2-server: at
org.graylog2.indexer.DeflectorInformation.getIndexInformation(DeflectorInformation.java:110)
Jun 4 15:52:57 graylog2 graylog2-server: at
org.graylog2.indexer.DeflectorInformation.getAsDatabaseObject(DeflectorInformation.java:84)
Jun 4 15:52:57 graylog2 graylog2-server: at
org.graylog2.periodical.DeflectorInformationWriterThread.run(DeflectorInformationWriterThread.java:72)
Jun 4 15:52:57 graylog2 graylog2-server: at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
Jun 4 15:52:57 graylog2 graylog2-server: at
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
Jun 4 15:52:57 graylog2 graylog2-server: at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
Jun 4 15:52:57 graylog2 graylog2-server: at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
Jun 4 15:52:57 graylog2 graylog2-server: at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
Jun 4 15:52:57 graylog2 graylog2-server: at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
Jun 4 15:52:57 graylog2 graylog2-server: at
java.lang.Thread.run(Thread.java:744)

Please let me know if you need further information.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b8044e30-2246-4246-b9ce-291644ef0021%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/b8044e30-2246-4246-b9ce-291644ef0021%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/GhKnPvx1rHw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624Y7mgLV-eDwSbyA4SFhYOmAHmyCgzoXrrtvpsGoOhRaeQ%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEM624Y7mgLV-eDwSbyA4SFhYOmAHmyCgzoXrrtvpsGoOhRaeQ%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/81C7B576-CC9A-4F4B-9F70-FF7CD6F0E0CC%40gmail.com
https://groups.google.com/d/msgid/elasticsearch/81C7B576-CC9A-4F4B-9F70-FF7CD6F0E0CC%40gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624aOrz8JoUaAa5JA8cmpDgtcY1P2kbpiYhNjJ%3Dr2inaafQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Quan Tong Anh) #5

Full elasticsearch.yml:

bootstrap:
mlockall: true

cluster:
name: domain.com
routing:
allocation:
node_concurrent_recoveries: 2
http:
port: 9200
enabled: true

transport:
tcp:
port: 9300

node:
name: elasticsearch-1
master: true
data: true

discovery:
zen:
ping:
multicast:
enabled: false
unicast:
hosts: ["elasticsearch-1.domain.com:9300","logs.domain.com:9300",
"elasticsearch-2.domain.com:9300",]

curl localhost:9200/_stats?pretty:

  "store" : {
    "size" : "24.6gb",
    "size_in_bytes" : 26521096680,
    "throttle_time" : "0s",
    "throttle_time_in_millis" : 0
  },
  "indexing" : {
    "index_total" : 126920,
    "index_time" : "2.7m",
    "index_time_in_millis" : 164595,
    "index_current" : 0,
    "delete_total" : 47005,
    "delete_time" : "3.5s",
    "delete_time_in_millis" : 3545,
    "delete_current" : 0
  },

On Thursday, June 5, 2014 10:04:04 AM UTC+7, Mark Walkom wrote:

  1. Depends
  2. See 1
  3. Add more nodes, more RAM or reduce your data set

For 1 and 2, you'll have to provide more info on your cluster setup, size
and use.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 5 June 2014 13:00, Quan Tong Anh <tonganh...@gmail.com <javascript:>>
wrote:

I would like to know:

  • What is the root cause?
  • How do I fix that?
  • If it’s memory problem? Is there anything that I can do (except for
    upgrade)?

On Jun 5, 2014, at 9:54 AM, Mark Walkom <ma...@campaignmonitor.com
<javascript:>> wrote:

What do you want to know exactly?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 5 June 2014 12:40, Quan Tong Anh <tonganh...@gmail.com <javascript:>>
wrote:

I'm running a 3-node cluster with 2 data nodes. My configuration:

es1, es2:

node:
name: elasticsearch-1
master: true
data: true

discovery:
zen: ping:
multicast:
enabled: false
unicast:
hosts: ["elasticsearch-1.domain.com:9300","logs.domain.com:9300","
elasticsearch-2.domain.com:9300",]

gl2:

node:
name: graylog2
master: false
data: false

Shinken has sent me a notification that said there is only 2 nodes in
cluster:

{
"cluster_name" : "domain.com",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 1,
"active_primary_shards" : 12,
"active_shards" : 12,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 12
}

Log on the ES-1:

[2014-06-04 15:51:09,281][WARN ][transport ] [elasticsearch
-1] Received response for a request that has timed out, sent [61627ms] ago
, timed ou
t [30338ms] ago, action [discovery/zen/fd/masterPing], node [[
elasticsearch-2][Vcvb6dtMQf-nfuB-wR9iew][inet[/107.170.x.y:9300]]{master=
true}], id [272380]
[2014-06-04 15:51:50,542][WARN ][index.cache.field.data.resident] [
elasticsearch-1] [graylog2-graylog2_2] loading field [_date ] caused out
of memory failure
java.lang.OutOfMemoryError: Java heap space
[2014-06-04 15:55:16,351][DEBUG][action.admin.indices.stats] [
elasticsearch-1] [graylog2-graylog2_5][2], node[Vcvb6dtMQf-nfuB-wR9iew], [
P], s[STARTED]: Failed
to execute [org.elasticsearch.action.admin.indices.stats.
IndicesStatsRequest@7631d2a2]
org.elasticsearch.transport.RemoteTransportException: [elasticsearch-2][
inet[/107.170.x.y:9300]][indices/stats/s]
Caused by: org.elasticsearch.index.IndexShardMissingException: [graylog2-
graylog2_5][2] missing
at org.elasticsearch.index.service.InternalIndexService.shardSafe(
InternalIndexService.java:179)
at org.elasticsearch.action.admin.indices.stats.
TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.
java:145)
at org.elasticsearch.action.admin.indices.stats.
TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.
java:53)
at org.elasticsearch.action.support.broadcast.
TransportBroadcastOperationAction$ShardTransportHandler.messageReceived(
TransportBroadcastOperationAction.java:398)
at org.elasticsearch.action.support.broadcast.
TransportBroadcastOperationAction$ShardTransportHandler.messageReceived(
TransportBroadcastOperationAction.java:384)
at org.elasticsearch.transport.netty.
MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:268)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
[2014-06-04 15:56:29,504][WARN ][index.engine.robin ] [elasticsearch
-1] [graylog2_recent][0] failed engine
java.lang.OutOfMemoryError: Java heap space

Log on the ES-2:

[2014-06-04 15:51:02,276][WARN ][transport.netty ] [elasticsearch
-2] exception caught on transport layer [[id: 0x72906b9d, /107.170.z.t:52899
=> /1
07.170.x.y:9300]], closing connection
java.lang.OutOfMemoryError: Java heap space
at java.nio.DirectByteBuffer.duplicate(DirectByteBuffer.java:217)
at org.elasticsearch.common.netty.channel.socket.nio.
SocketSendBufferPool.acquire(SocketSendBufferPool.java:87)
at org.elasticsearch.common.netty.channel.socket.nio.
SocketSendBufferPool.acquire(SocketSendBufferPool.java:46)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioWorker.write0(AbstractNioWorker.java:190)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:150)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioSelector.run(AbstractNioSelector.java:290)
at org.elasticsearch.common.netty.channel.socket.nio.
AbstractNioWorker.run(AbstractNioWorker.java:88)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run
(NioWorker.java:178)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(
ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.
DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

[2014-06-04 15:51:27,143][WARN ][indices.cluster ] [elasticsearch
-2] [graylog2-graylog2_5][2] master [[elasticsearch-2][Vcvb6dtMQf-nfuB-
wR9iew][inet[/107.170.x.y:9300]]{master=true}] marked shard as started,
but shard have not been created, mark shard as failed

Log on the GL2:

Jun 4 15:51:11 graylog2 graylog2-server: 2014-06-04 15:51:11,040 WARN :
org.graylog2.buffers.processors.OutputBufferProcessor - Timeout reached.
Not waiting
any longer for writer threads to complete.
Jun 4 15:51:14 graylog2 graylog2-server: 2014-06-04 15:51:14,694 WARN :
org.elasticsearch.discovery.zen - [graylog2] master_left and no other
node elected to become master, current nodes: {[graylog2][hHcLLZ2GTamMajmE
-a5lXg][inet[/107.170.z.t:9300]]{client=true, data=false, master=false},}
Jun 4 15:51:14 graylog2 graylog2-server: 2014-06-04 15:51:14,708 ERROR:
org.graylog2.periodical.DeflectorManagerThread - Tried to check for
number of messages in current deflector target but did not find index.
Aborting.
Jun 4 15:51:14 graylog2 graylog2-server: org.elasticsearch.cluster.block.
ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not
recovered / initialized];[SERVICE_UNAVAILABLE/2/no master];
Jun 4 15:51:14 graylog2 graylog2-server: 2014-06-04 15:51:14,709 ERROR:
org.graylog2.periodical.DeflectorManagerThread - Couldn't delete outdated
or empty indices
Jun 4 15:52:57 graylog2 graylog2-server: 2014-06-04 15:52:57,339 ERROR:
org.graylog2.indexer.EmbeddedElasticSearchClient - Could not read name of
ES node.
Jun 4 15:52:57 graylog2 graylog2-server: java.lang.NullPointerException
Jun 4 15:52:57 graylog2 graylog2-server: at
org.graylog2.indexer.EmbeddedElasticSearchClient.nodeIdToName(EmbeddedElasticSearchClient.java:135)
Jun 4 15:52:57 graylog2 graylog2-server: at
org.graylog2.indexer.DeflectorInformation.getShardInformation(DeflectorInformation.java:125)
Jun 4 15:52:57 graylog2 graylog2-server: at
org.graylog2.indexer.DeflectorInformation.getIndexInformation(DeflectorInformation.java:110)
Jun 4 15:52:57 graylog2 graylog2-server: at
org.graylog2.indexer.DeflectorInformation.getAsDatabaseObject(DeflectorInformation.java:84)
Jun 4 15:52:57 graylog2 graylog2-server: at
org.graylog2.periodical.DeflectorInformationWriterThread.run(DeflectorInformationWriterThread.java:72)
Jun 4 15:52:57 graylog2 graylog2-server: at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
Jun 4 15:52:57 graylog2 graylog2-server: at
java.util.concurrent.FutureTask.runAndReset(FutureTask.java:304)
Jun 4 15:52:57 graylog2 graylog2-server: at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:178)
Jun 4 15:52:57 graylog2 graylog2-server: at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
Jun 4 15:52:57 graylog2 graylog2-server: at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
Jun 4 15:52:57 graylog2 graylog2-server: at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
Jun 4 15:52:57 graylog2 graylog2-server: at
java.lang.Thread.run(Thread.java:744)

Please let me know if you need further information.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receivi

...

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/148a4c61-f01f-4b6d-893a-f62635c1988c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #6

Ok.
What version are you on, what OS, what java version (release and number),
what are your node specs (RAM, CPU, disk), how much heap are you using, how
many indexes do you have, how many documents are there, what is the average
size of the document, how are you loading data into the cluster, what sort
of queries are you running?

Help us help you by providing as much info a you can :slight_smile:

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 5 June 2014 13:46, Quan Tong Anh tonganhquan.net@gmail.com wrote:

Full elasticsearch.yml:

bootstrap:
mlockall: true

cluster:
name: domain.com
routing:
allocation:
node_concurrent_recoveries: 2
http:
port: 9200
enabled: true

transport:
tcp:
port: 9300

node:
name: elasticsearch-1
master: true
data: true

discovery:
zen:
ping:
multicast:
enabled: false
unicast:
hosts: ["elasticsearch-1.domain.com:9300","logs.domain.com:9300","
elasticsearch-2.domain.com:9300",]

curl localhost:9200/_stats?pretty:

  "store" : {
    "size" : "24.6gb",
    "size_in_bytes" : 26521096680,
    "throttle_time" : "0s",
    "throttle_time_in_millis" : 0
  },
  "indexing" : {
    "index_total" : 126920,
    "index_time" : "2.7m",
    "index_time_in_millis" : 164595,
    "index_current" : 0,
    "delete_total" : 47005,
    "delete_time" : "3.5s",
    "delete_time_in_millis" : 3545,
    "delete_current" : 0
  },

On Thursday, June 5, 2014 10:04:04 AM UTC+7, Mark Walkom wrote:

  1. Depends
  2. See 1
  3. Add more nodes, more RAM or reduce your data set

For 1 and 2, you'll have to provide more info on your cluster setup, size
and use.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 5 June 2014 13:00, Quan Tong Anh tonganh...@gmail.com wrote:

I would like to know:

  • What is the root cause?
  • How do I fix that?
  • If it’s memory problem? Is there anything that I can do (except for
    upgrade)?

On Jun 5, 2014, at 9:54 AM, Mark Walkom ma...@campaignmonitor.com
wrote:

What do you want to know exactly?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 5 June 2014 12:40, Quan Tong Anh tonganh...@gmail.com wrote:

I'm running a 3-node cluster with 2 data nodes. My configuration:

es1, es2:

node:
name: elasticsearch-1
master: true
data: true

discovery:
zen: ping:
multicast:
enabled: false
unicast:
hosts: ["elasticsearch-1.domain.com:9300","logs.domain.com:9300",
"elasticsearch-2.domain.com:9300",]

gl2:

node:
name: graylog2
master: false
data: false

Shinken has sent me a notification that said there is only 2 nodes in
cluster:

{
"cluster_name" : "domain.com",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 1,
"active_primary_shards" : 12,
"active_shards" : 12,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 12
}

Log on the ES-1:

[2014-06-04 15:51:09,281][WARN ][transport ] [
elasticsearch-1] Received response for a request that has timed out,
sent [61627ms] ago, timed ou
t [30338ms] ago, action [discovery/zen/fd/masterPing], node [[
elasticsearch-2][Vcvb6dtMQf-nfuB-wR9iew][inet[/107.170.x.y:9300]]{master=
true}], id [272380]
[2014-06-04 15:51:50,542][WARN ][index.cache.field.data.resident] [
elasticsearch-1] [graylog2-graylog2_2] loading field [_date ] caused out
of memory failure
java.lang.OutOfMemoryError: Java heap space
[2014-06-04 15:55:16,351][DEBUG][action.admin.indices.stats] [
elasticsearch-1] [graylog2-graylog2_5][2], node[Vcvb6dtMQf-nfuB-wR9iew],
[P], s[STARTED]: Failed
to execute [org.elasticsearch.action.admin.indices.stats.IndicesStatsRe
quest@7631d2a2]
org.elasticsearch.transport.RemoteTransportException: [elasticsearch-2][
inet[/107.170.x.y:9300]][indices/stats/s]
Caused by: org.elasticsearch.index.IndexShardMissingException: [graylog2-
graylog2_5][2] missing
at org.elasticsearch.index.service.InternalIndexService.shardSafe
(InternalIndexService.java:179)
at org.elasticsearch.action.admin.indices.stats.TransportIndice
sStatsAction.shardOperation(TransportIndicesStatsAction.java:145)
at org.elasticsearch.action.admin.indices.stats.TransportIndice
sStatsAction.shardOperation(TransportIndicesStatsAction.java:53)
at org.elasticsearch.action.support.broadcast.TransportBroadcas
tOperationAction$ShardTransportHandler.messageReceived(
TransportBroadcastOperationAction.java:398)
at org.elasticsearch.action.support.broadcast.TransportBroadcas
tOperationAction$ShardTransportHandler.messageReceived(
TransportBroadcastOperationAction.java:384)
at org.elasticsearch.transport.netty.MessageChannelHandler$
RequestHandler.run(MessageChannelHandler.java:268)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
Executor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
lExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
[2014-06-04 15:56:29,504][WARN ][index.engine.robin ] [
elasticsearch-1] [graylog2_recent][0] failed engine
java.lang.OutOfMemoryError: Java heap space

Log on the ES-2:

[2014-06-04 15:51:02,276][WARN ][transport.netty ] [
elasticsearch-2] exception caught on transport layer [[id: 0x72906b9d, /107.170.z.t:52899
=> /1
07.170.x.y:9300]], closing connection
java.lang.OutOfMemoryError: Java heap space
at java.nio.DirectByteBuffer.duplicate(DirectByteBuffer.java:217)
at org.elasticsearch.common.netty.channel.socket.nio.SocketSend
BufferPool.acquire(SocketSendBufferPool.java:87)
at org.elasticsearch.common.netty.channel.socket.nio.SocketSend
BufferPool.acquire(SocketSendBufferPool.java:46)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
oWorker.write0(AbstractNioWorker.java:190)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
oWorker.writeFromTaskLoop(AbstractNioWorker.java:150)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
oChannel$WriteTask.run(AbstractNioChannel.java:335)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
oSelector.processTaskQueue(AbstractNioSelector.java:366)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
oSelector.run(AbstractNioSelector.java:290)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
oWorker.run(AbstractNioWorker.java:88)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.
run(NioWorker.java:178)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run
(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWo
rker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
Executor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
lExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

[2014-06-04 15:51:27,143][WARN ][indices.cluster ] [
elasticsearch-2] [graylog2-graylog2_5][2] master [[elasticsearch-2][
Vcvb6dtMQf-nfuB-wR9iew][inet[/107.170.x.y:9300]]{master=true}] marked
shard as started, but shard have not been created, mark shard as failed

Log on the GL2:

Jun 4 15:51:11 graylog2 graylog2-server: 2014-06-04 15:51:11,040 WARN :
org.graylog2.buffers.processors.OutputBufferProcessor - Timeout reached.
Not waiting
any longer for writer threads to complete.
Jun 4 15:51:14 graylog2 graylog2-server: 2014-06-04 15:51:14,694 WARN :
org.elasticsearch.discovery.zen - [graylog2] master_left and no other
node elected to become master, current nodes: {[graylog2][
hHcLLZ2GTamMajmE-a5lXg][inet[/107.170.z.t:9300]]{client=true, data=false,
master=false},}
Jun 4 15:51:14 graylog2 graylog2-server: 2014-06-04 15:51:14,708 ERROR:
org.graylog2.periodical.DeflectorManagerThread - Tried to check for
number of messages in current deflector target but did not find index.
Aborting.
Jun 4 15:51:14 graylog2 graylog2-server: org.elasticsearch.cluster.block
.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not
recovered / initialized];[SERVICE_UNAVAILABLE/2/no master];
Jun 4 15:51:14 graylog2 graylog2-server: 2014-06-04 15:51:14,709 ERROR:
org.graylog2.periodical.DeflectorManagerThread - Couldn't delete
outdated or empty indices
Jun 4 15:52:57 graylog2 graylog2-server: 2014-06-04 15:52:57,339 ERROR:
org.graylog2.indexer.EmbeddedElasticSearchClient - Could not read name
of ES node.
Jun 4 15:52:57 graylog2 graylog2-server: java.lang.NullPointerException
Jun 4 15:52:57 graylog2 graylog2-server: at org.graylog2.indexer.
EmbeddedElasticSearchClient.nodeIdToName(EmbeddedElasticSearchClient.
java:135)
Jun 4 15:52:57 graylog2 graylog2-server: at org.graylog2.indexer.
DeflectorInformation.getShardInformation(DeflectorInformation.java:125)
Jun 4 15:52:57 graylog2 graylog2-server: at org.graylog2.indexer.
DeflectorInformation.getIndexInformation(DeflectorInformation.java:110)
Jun 4 15:52:57 graylog2 graylog2-server: at org.graylog2.indexer.
DeflectorInformation.getAsDatabaseObject(DeflectorInformation.java:84)
Jun 4 15:52:57 graylog2 graylog2-server: at
org.graylog2.periodical.DeflectorInformationWriterThread.run(
DeflectorInformationWriterThread.java:72)
Jun 4 15:52:57 graylog2 graylog2-server: at java.util.concurrent.
Executors$RunnableAdapter.call(Executors.java:471)
Jun 4 15:52:57 graylog2 graylog2-server: at java.util.concurrent.
FutureTask.runAndReset(FutureTask.java:304)
Jun 4 15:52:57 graylog2 graylog2-server: at java.util.concurrent.
ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(
ScheduledThreadPoolExecutor.java:178)
Jun 4 15:52:57 graylog2 graylog2-server: at java.util.concurrent.
ScheduledThreadPoolExecutor$ScheduledFutureTask.run(
ScheduledThreadPoolExecutor.java:293)
Jun 4 15:52:57 graylog2 graylog2-server: at java.util.concurrent.
ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
Jun 4 15:52:57 graylog2 graylog2-server: at java.util.concurrent.
ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
Jun 4 15:52:57 graylog2 graylog2-server: at
java.lang.Thread.run(Thread.java:744)

Please let me know if you need further information.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receivi

...

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/148a4c61-f01f-4b6d-893a-f62635c1988c%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/148a4c61-f01f-4b6d-893a-f62635c1988c%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZGo7Nbk2y0q_XpgGayxY8iSOZ1raQpv5FsSamtoR-V8Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Quan Tong Anh) #7
  • ElasticSearch version: 0.20.5
  • OS: Ubuntu 12.04
  • java version “1.7.0_55"
  • node specs: 2GB RAM, 2 cores 2400 MHz - QEMU Virtual CPU version 1.0
  • heap: -Xms750m -Xmx750m
  • index_total: 126920
  • docs_total: 30137401
  • average size: how can I find out this?
  • how are you loading data into the cluster: we use Graylog2 to insert logs
    in ES

elasticsearch_config_file = /etc/graylog2-elasticsearch.ymlelasticsearch_max_docs_per_index
= 2000000
elasticsearch_index_prefix = graylog2-graylog2elasticsearch_max_number_of_indices
= 5
elasticsearch_shards = 4elasticsearch_replicas = 0
elasticsearch_analyzer = standardrecent_index_ttl_minutes = 30
recent_index_store_type = niofsforce_syslog_rdns = false
allow_override_syslog_date = trueoutput_batch_size = 5000
processbuffer_processors = 5outputbuffer_processors = 5
processor_wait_strategy = blocking
ring_size = 1024

mongodb_useauth = false

mongodb_host = 127.0.0.1
mongodb_database = graylog2
mongodb_port = 27017
mongodb_max_connections = 100
mongodb_threads_allowed_to_block_multiplier = 5

use_gelf = true

On Thursday, June 5, 2014 10:55:56 AM UTC+7, Mark Walkom wrote:

Ok.
What version are you on, what OS, what java version (release and number),
what are your node specs (RAM, CPU, disk), how much heap are you using, how
many indexes do you have, how many documents are there, what is the average
size of the document, how are you loading data into the cluster, what sort
of queries are you running?

Help us help you by providing as much info a you can :slight_smile:

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com <javascript:>
web: www.campaignmonitor.com

On 5 June 2014 13:46, Quan Tong Anh <tonganh...@gmail.com <javascript:>>
wrote:

Full elasticsearch.yml:

bootstrap:
mlockall: true

cluster:
name: domain.com
routing:
allocation:
node_concurrent_recoveries: 2
http:
port: 9200
enabled: true

transport:
tcp:
port: 9300

node:
name: elasticsearch-1
master: true
data: true

discovery:
zen:
ping:
multicast:
enabled: false
unicast:
hosts: ["elasticsearch-1.domain.com:9300","logs.domain.com:9300","
elasticsearch-2.domain.com:9300",]

curl localhost:9200/_stats?pretty:

  "store" : {
    "size" : "24.6gb",
    "size_in_bytes" : 26521096680,
    "throttle_time" : "0s",
    "throttle_time_in_millis" : 0
  },
  "indexing" : {
    "index_total" : 126920,
    "index_time" : "2.7m",
    "index_time_in_millis" : 164595,
    "index_current" : 0,
    "delete_total" : 47005,
    "delete_time" : "3.5s",
    "delete_time_in_millis" : 3545,
    "delete_current" : 0
  },

On Thursday, June 5, 2014 10:04:04 AM UTC+7, Mark Walkom wrote:

  1. Depends
  2. See 1
  3. Add more nodes, more RAM or reduce your data set

For 1 and 2, you'll have to provide more info on your cluster setup, size
and use.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 5 June 2014 13:00, Quan Tong Anh tonganh...@gmail.com wrote:

I would like to know:

  • What is the root cause?
  • How do I fix that?
  • If it’s memory problem? Is there anything that I can do (except for
    upgrade)?

On Jun 5, 2014, at 9:54 AM, Mark Walkom ma...@campaignmonitor.com wrote:

What do you want to know exactly?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 5 June 2014 12:40, Quan Tong Anh tonganh...@gmail.com wrote:

I'm running a 3-node cluster with 2 data nodes. My configuration:

es1, es2:

node:
name: elasticsearch-1
master: true
data: true

discovery:
zen: ping:
multicast:
enabled: false
unicast:
hosts: ["elasticsearch-1.domain.com:9300","logs.domain.com:9300","
elasticsearch-2.domain.com:9300",]

gl2:

node:
name: graylog2
master: false
data: false

Shinken has sent me a notification that said there is only 2 nodes in
cluster:

{
"cluster_name" : "domain.com",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 1,
"active_primary_shards" : 12,
"active_shards" : 12,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 12
}

Log on the ES-1:

[2014-06-04 15:51:09,281][WARN ][transport ] [elasticsearch
-1] Received response for a request that has timed out, sent [61627ms] ago
, timed ou
t [30338ms] ago, action [discovery/zen/fd/masterPing], node [[
elasticsearch-2][Vcvb6dtMQf-nfuB-wR9iew][inet[/107.170.x.y:9300]]{master=
true}], id [272380]
[2014-06-04 15:51:50,542][WARN ][index.cache.field.data.resident] [
elasticsearch-1] [graylog2-graylog2_2] loading field [_date ] caused out
of memory failure
java.lang.OutOfMemoryError: Java heap space
[2014-06-04 15:55:16,351][DEBUG][action.admin.indices.stats] [
elasticsearch-1] [graylog2-graylog2_5][2], node[Vcvb6dtMQf-nfuB-wR9iew], [
P], s[STARTED]: Failed
to execute [org.elasticsearch.action.admin.indices.stats.IndicesStatsRe
quest@7631d2a2]
org.elasticsearch.transport.RemoteTransportException: [elasticsearch-2][
inet[/107.170.x.y:9300]][indices/stats/s]
Caused by: org.elasticsearch.index.IndexShardMissingException: [graylog2-
graylog2_5][2] missing
at org.elasticsearch.index.service.InternalIndexService.shardSafe(
InternalIndexService.java:179)
at org.elasticsearch.action.admin.indices.stats.TransportIndice
sStatsAction.shardOperation(TransportIndicesStatsAction.java:145)
at org.elasticsearch.action.admin.indices.stats.TransportIndice
sStatsAction.shardOperation(TransportIndicesStatsAction.java:53)
at org.elasticsearch.action.support.broadcast.TransportBroadcas
tOperationAction$ShardTransportHandler.messageReceived(
TransportBroadcastOperationAction.java:398)
at org.elasticsearch.action.support.broadcast.TransportBroadcas
tOperationAction$ShardTransportHandler.messageReceived(
TransportBroadcastOperationAction.java:384)
at org.elasticsearch.transport.netty.MessageChannelHandler$
RequestHandler.run(MessageChannelHandler.java:268)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
Executor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
lExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
[2014-06-04 15:56:29,504][WARN ][index.engine.robin ] [elasticsearch
-1] [graylog2_recent][0] failed engine
java.lang.OutOfMemoryError: Java heap space

Log on the ES-2:

[2014-06-04 15:51:02,276][WARN ][transport.netty ] [elasticsearch
-2] exception caught on transport layer [[id: 0x72906b9d, /107.170.z.t:52899
=> /1
07.170.x.y:9300]], closing connection
java.lang.OutOfMemoryError: Java heap space
at java.nio.DirectByteBuffer.duplicate(DirectByteBuffer.java:217)
at org.elasticsearch.common.netty.channel.socket.nio.SocketSend
BufferPool.acquire(SocketSendBufferPool.java:87)
at org.elasticsearch.common.netty.channel.socket.nio.SocketSend
BufferPool.acquire(SocketSendBufferPool.java:46)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
oWorker.write0(AbstractNioWorker.java:190)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
oWorker.writeFromTaskLoop(AbstractNioWorker.java:150)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
oChannel$WriteTask.run(AbstractNioChannel.java:335)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
oSelector.processTaskQueue(AbstractNioSelector.java:366)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
oSelector.run(AbstractNioSelector.java:290)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
oWorker.run(AbstractNioWorker.java:88)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run
(NioWorker.java:178)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(
ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWo
rker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
Executor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
lExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

[2014-06-04 15:51:27,143][WARN ][indices.cluster ] [elasticsearch
-2] [graylog2-graylog2_5][2] master [[elasticsearch-2][Vcvb6dtMQf-nfuB-
wR9iew][inet[/107.170.x.y:9300]]{master=true}] marked shard as started,
but shard have not been created, mark shard as failed

Log on the GL2:

<div sty

...

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5d62edc7-f55e-4cc8-b627-f5ec7472855a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #8

In that case you're running out of heap. You need to increase your existing
heap (to a max of 50% system memory), or add more nodes, or add more memory
to your existing nodes, or delete some data.

You should really upgrade ES to, you will get a lot of benefits from newer
versions.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 5 June 2014 15:25, Quan Tong Anh tonganhquan.net@gmail.com wrote:

  • ElasticSearch version: 0.20.5
  • OS: Ubuntu 12.04
  • java version “1.7.0_55"
  • node specs: 2GB RAM, 2 cores 2400 MHz - QEMU Virtual CPU version 1.0
  • heap: -Xms750m -Xmx750m
  • index_total: 126920
  • docs_total: 30137401
  • average size: how can I find out this?
  • how are you loading data into the cluster: we use Graylog2 to insert
    logs in ES

elasticsearch_config_file = /etc/graylog2-elasticsearch.ymlelasticsearch_max_docs_per_index
= 2000000
elasticsearch_index_prefix = graylog2-graylog2elasticsearch_max_number_of_indices
= 5
elasticsearch_shards = 4elasticsearch_replicas = 0
elasticsearch_analyzer = standardrecent_index_ttl_minutes = 30
recent_index_store_type = niofsforce_syslog_rdns = false
allow_override_syslog_date = trueoutput_batch_size = 5000
processbuffer_processors = 5outputbuffer_processors = 5
processor_wait_strategy = blocking
ring_size = 1024

mongodb_useauth = false

mongodb_host = 127.0.0.1
mongodb_database = graylog2
mongodb_port = 27017
mongodb_max_connections = 100
mongodb_threads_allowed_to_block_multiplier = 5

use_gelf = true

On Thursday, June 5, 2014 10:55:56 AM UTC+7, Mark Walkom wrote:

Ok.
What version are you on, what OS, what java version (release and number),
what are your node specs (RAM, CPU, disk), how much heap are you using, how
many indexes do you have, how many documents are there, what is the average
size of the document, how are you loading data into the cluster, what sort
of queries are you running?

Help us help you by providing as much info a you can :slight_smile:

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 5 June 2014 13:46, Quan Tong Anh tonganh...@gmail.com wrote:

Full elasticsearch.yml:

bootstrap:
mlockall: true

cluster:
name: domain.com
routing:
allocation:
node_concurrent_recoveries: 2
http:
port: 9200
enabled: true

transport:
tcp:
port: 9300

node:
name: elasticsearch-1
master: true
data: true

discovery:
zen:
ping:
multicast:
enabled: false
unicast:
hosts: ["elasticsearch-1.domain.com:9300","logs.domain.com:9300",
"elasticsearch-2.domain.com:9300",]

curl localhost:9200/_stats?pretty:

  "store" : {
    "size" : "24.6gb",
    "size_in_bytes" : 26521096680,
    "throttle_time" : "0s",
    "throttle_time_in_millis" : 0
  },
  "indexing" : {
    "index_total" : 126920,
    "index_time" : "2.7m",
    "index_time_in_millis" : 164595,
    "index_current" : 0,
    "delete_total" : 47005,
    "delete_time" : "3.5s",
    "delete_time_in_millis" : 3545,
    "delete_current" : 0
  },

On Thursday, June 5, 2014 10:04:04 AM UTC+7, Mark Walkom wrote:

  1. Depends
  2. See 1
  3. Add more nodes, more RAM or reduce your data set

For 1 and 2, you'll have to provide more info on your cluster setup, size
and use.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 5 June 2014 13:00, Quan Tong Anh tonganh...@gmail.com wrote:

I would like to know:

  • What is the root cause?
  • How do I fix that?
  • If it’s memory problem? Is there anything that I can do (except for
    upgrade)?

On Jun 5, 2014, at 9:54 AM, Mark Walkom ma...@campaignmonitor.com
wrote:

What do you want to know exactly?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 5 June 2014 12:40, Quan Tong Anh tonganh...@gmail.com wrote:

I'm running a 3-node cluster with 2 data nodes. My configuration:

es1, es2:

node:
name: elasticsearch-1
master: true
data: true

discovery:
zen: ping:
multicast:
enabled: false
unicast:
hosts: ["elasticsearch-1.domain.com:9300","logs.domain.com:9300",
"elasticsearch-2.domain.com:9300",]

gl2:

node:
name: graylog2
master: false
data: false

Shinken has sent me a notification that said there is only 2 nodes in
cluster:

{
"cluster_name" : "domain.com",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 1,
"active_primary_shards" : 12,
"active_shards" : 12,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 12
}

Log on the ES-1:

[2014-06-04 15:51:09,281][WARN ][transport ] [
elasticsearch-1] Received response for a request that has timed out,
sent [61627ms] ago, timed ou
t [30338ms] ago, action [discovery/zen/fd/masterPing], node [[
elasticsearch-2][Vcvb6dtMQf-nfuB-wR9iew][inet[/107.170.x.y:9300]]{master=
true}], id [272380]
[2014-06-04 15:51:50,542][WARN ][index.cache.field.data.resident] [
elasticsearch-1] [graylog2-graylog2_2] loading field [_date ] caused out
of memory failure
java.lang.OutOfMemoryError: Java heap space
[2014-06-04 15:55:16,351][DEBUG][action.admin.indices.stats] [
elasticsearch-1] [graylog2-graylog2_5][2], node[Vcvb6dtMQf-nfuB-wR9iew],
[P], s[STARTED]: Failed
to execute [org.elasticsearch.action.admin.indices.stats.IndicesStatsRe
quest@7631d2a2]
org.elasticsearch.transport.RemoteTransportException: [elasticsearch-2][
inet[/107.170.x.y:9300]][indices/stats/s]
Caused by: org.elasticsearch.index.IndexShardMissingException: [graylog2-
graylog2_5][2] missing
at org.elasticsearch.index.service.InternalIndexService.shardSafe
(InternalIndexService.java:179)
at org.elasticsearch.action.admin.indices.stats.TransportIndice
sStatsAction.shardOperation(TransportIndicesStatsAction.java:145)
at org.elasticsearch.action.admin.indices.stats.TransportIndice
sStatsAction.shardOperation(TransportIndicesStatsAction.java:53)
at org.elasticsearch.action.support.broadcast.TransportBroadcas
tOperationAction$ShardTransportHandler.messageReceived(Trans
portBroadcastOperationAction.java:398)
at org.elasticsearch.action.support.broadcast.TransportBroadcas
tOperationAction$ShardTransportHandler.messageReceived(Trans
portBroadcastOperationAction.java:384)
at org.elasticsearch.transport.netty.MessageChannelHandler$Requ
estHandler.run(MessageChannelHandler.java:268)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
Executor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
lExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
[2014-06-04 15:56:29,504][WARN ][index.engine.robin ] [
elasticsearch-1] [graylog2_recent][0] failed engine
java.lang.OutOfMemoryError: Java heap space

Log on the ES-2:

[2014-06-04 15:51:02,276][WARN ][transport.netty ] [
elasticsearch-2] exception caught on transport layer [[id: 0x72906b9d, /107.170.z.t:52899
=> /1
07.170.x.y:9300]], closing connection
java.lang.OutOfMemoryError: Java heap space
at java.nio.DirectByteBuffer.duplicate(DirectByteBuffer.java:217)
at org.elasticsearch.common.netty.channel.socket.nio.SocketSend
BufferPool.acquire(SocketSendBufferPool.java:87)
at org.elasticsearch.common.netty.channel.socket.nio.SocketSend
BufferPool.acquire(SocketSendBufferPool.java:46)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
oWorker.write0(AbstractNioWorker.java:190)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
oWorker.writeFromTaskLoop(AbstractNioWorker.java:150)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
oChannel$WriteTask.run(AbstractNioChannel.java:335)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
oSelector.processTaskQueue(AbstractNioSelector.java:366)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
oSelector.run(AbstractNioSelector.java:290)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
oWorker.run(AbstractNioWorker.java:88)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.
run(NioWorker.java:178)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run
(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWo
rker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
Executor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
lExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

[2014-06-04 15:51:27,143][WARN ][indices.cluster ] [
elasticsearch-2] [graylog2-graylog2_5][2] master [[elasticsearch-2][
Vcvb6dtMQf-nfuB-wR9iew][inet[/107.170.x.y:9300]]{master=true}] marked
shard as started, but shard have not been created, mark shard as failed

Log on the GL2:

<div sty

...

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5d62edc7-f55e-4cc8-b627-f5ec7472855a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5d62edc7-f55e-4cc8-b627-f5ec7472855a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624aLL3DcaDB%2B6RwLkDxTC5GL%2BZaNJVG2LwrRNicUMrRRvw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Quan Tong Anh) #9

Mark Walkom, what is the relation between index and heap size?
From the below numbers, how can I make sure that it's really running out of heap?

On Jun 5, 2014, at 12:41 PM, Mark Walkom markw@campaignmonitor.com wrote:

In that case you're running out of heap. You need to increase your existing heap (to a max of 50% system memory), or add more nodes, or add more memory to your existing nodes, or delete some data.

You should really upgrade ES to, you will get a lot of benefits from newer versions.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 5 June 2014 15:25, Quan Tong Anh tonganhquan.net@gmail.com wrote:

  • ElasticSearch version: 0.20.5
  • OS: Ubuntu 12.04
  • java version "1.7.0_55"
  • node specs: 2GB RAM, 2 cores 2400 MHz - QEMU Virtual CPU version 1.0
  • heap: -Xms750m -Xmx750m
  • index_total: 126920
  • docs_total: 30137401
  • average size: how can I find out this?
  • how are you loading data into the cluster: we use Graylog2 to insert logs in ES

elasticsearch_config_file = /etc/graylog2-elasticsearch.ymlelasticsearch_max_docs_per_index = 2000000
elasticsearch_index_prefix = graylog2-graylog2elasticsearch_max_number_of_indices = 5
elasticsearch_shards = 4elasticsearch_replicas = 0
elasticsearch_analyzer = standardrecent_index_ttl_minutes = 30
recent_index_store_type = niofsforce_syslog_rdns = false
allow_override_syslog_date = trueoutput_batch_size = 5000
processbuffer_processors = 5outputbuffer_processors = 5
processor_wait_strategy = blocking
ring_size = 1024

mongodb_useauth = false

mongodb_host = 127.0.0.1
mongodb_database = graylog2
mongodb_port = 27017
mongodb_max_connections = 100
mongodb_threads_allowed_to_block_multiplier = 5

use_gelf = true

On Thursday, June 5, 2014 10:55:56 AM UTC+7, Mark Walkom wrote:
Ok.
What version are you on, what OS, what java version (release and number), what are your node specs (RAM, CPU, disk), how much heap are you using, how many indexes do you have, how many documents are there, what is the average size of the document, how are you loading data into the cluster, what sort of queries are you running?

Help us help you by providing as much info a you can :slight_smile:

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 5 June 2014 13:46, Quan Tong Anh tonganh...@gmail.com wrote:
Full elasticsearch.yml:

bootstrap:
mlockall: true

cluster:
name: domain.com
routing:
allocation:
node_concurrent_recoveries: 2
http:
port: 9200
enabled: true

transport:
tcp:
port: 9300

node:
name: elasticsearch-1
master: true
data: true

discovery:
zen:
ping:
multicast:
enabled: false
unicast:
hosts: ["elasticsearch-1.domain.com:9300","logs.domain.com:9300","elasticsearch-2.domain.com:9300",]

curl localhost:9200/_stats?pretty:

  "store" : {
    "size" : "24.6gb",
    "size_in_bytes" : 26521096680,
    "throttle_time" : "0s",
    "throttle_time_in_millis" : 0
  },
  "indexing" : {
    "index_total" : 126920,
    "index_time" : "2.7m",
    "index_time_in_millis" : 164595,
    "index_current" : 0,
    "delete_total" : 47005,
    "delete_time" : "3.5s",
    "delete_time_in_millis" : 3545,
    "delete_current" : 0
  },

On Thursday, June 5, 2014 10:04:04 AM UTC+7, Mark Walkom wrote:

  1. Depends
  2. See 1
  3. Add more nodes, more RAM or reduce your data set

For 1 and 2, you'll have to provide more info on your cluster setup, size and use.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 5 June 2014 13:00, Quan Tong Anh tonganh...@gmail.com wrote:
I would like to know:

  • What is the root cause?
  • How do I fix that?
  • If it's memory problem? Is there anything that I can do (except for upgrade)?

On Jun 5, 2014, at 9:54 AM, Mark Walkom ma...@campaignmonitor.com wrote:

What do you want to know exactly?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 5 June 2014 12:40, Quan Tong Anh tonganh...@gmail.com wrote:
I'm running a 3-node cluster with 2 data nodes. My configuration:

es1, es2:

node:
name: elasticsearch-1
master: true
data: true

discovery:
zen: ping:
multicast:
enabled: false
unicast:
hosts: ["elasticsearch-1.domain.com:9300","logs.domain.com:9300","elasticsearch-2.domain.com:9300",]

gl2:

node:
name: graylog2
master: false
data: false

Shinken has sent me a notification that said there is only 2 nodes in cluster:

{
"cluster_name" : "domain.com",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 1,
"active_primary_shards" : 12,
"active_shards" : 12,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 12
}

Log on the ES-1:

[2014-06-04 15:51:09,281][WARN ][transport ] [elasticsearch-1] Received response for a request that has timed out, sent [61627ms] ago, timed ou
t [30338ms] ago, action [discovery/zen/fd/masterPing], node [[elasticsearch-2][Vcvb6dtMQf-nfuB-wR9iew][inet[/107.170.x.y:9300]]{master=true}], id [272380]
[2014-06-04 15:51:50,542][WARN ][index.cache.field.data.resident] [elasticsearch-1] [graylog2-graylog2_2] loading field [_date ] caused out of memory failure
java.lang.OutOfMemoryError: Java heap space
[2014-06-04 15:55:16,351][DEBUG][action.admin.indices.stats] [elasticsearch-1] [graylog2-graylog2_5][2], node[Vcvb6dtMQf-nfuB-wR9iew], [P], s[STARTED]: Failed
to execute [org.elasticsearch.action.admin.indices.stats.IndicesStatsRequest@7631d2a2]
org.elasticsearch.transport.RemoteTransportException: [elasticsearch-2][inet[/107.170.x.y:9300]][indices/stats/s]
Caused by: org.elasticsearch.index.IndexShardMissingException: [graylog2-graylog2_5][2] missing
at org.elasticsearch.index.service.InternalIndexService.shardSafe(InternalIndexService.java:179)
at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:145)
at org.elasticsearch.action.admin.indices.stats.TransportIndicesStatsAction.shardOperation(TransportIndicesStatsAction.java:53)
at org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$ShardTransportHandler.messageReceived(TransportBroadcastOperationAction.java:398)
at org.elasticsearch.action.support.broadcast.TransportBroadcastOperationAction$ShardTransportHandler.messageReceived(TransportBroadcastOperationAction.java:384)
at org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:268)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
[2014-06-04 15:56:29,504][WARN ][index.engine.robin ] [elasticsearch-1] [graylog2_recent][0] failed engine
java.lang.OutOfMemoryError: Java heap space

Log on the ES-2:

[2014-06-04 15:51:02,276][WARN ][transport.netty ] [elasticsearch-2] exception caught on transport layer [[id: 0x72906b9d, /107.170.z.t:52899 => /1
07.170.x.y:9300]], closing connection
java.lang.OutOfMemoryError: Java heap space
at java.nio.DirectByteBuffer.duplicate(DirectByteBuffer.java:217)
at org.elasticsearch.common.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:87)
at org.elasticsearch.common.netty.channel.socket.nio.SocketSendBufferPool.acquire(SocketSendBufferPool.java:46)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.write0(AbstractNioWorker.java:190)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.writeFromTaskLoop(AbstractNioWorker.java:150)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioChannel$WriteTask.run(AbstractNioChannel.java:335)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.processTaskQueue(AbstractNioSelector.java:366)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:290)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:88)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

[2014-06-04 15:51:27,143][WARN ][indices.cluster ] [elasticsearch-2] [graylog2-graylog2_5][2] master [[elasticsearch-2][Vcvb6dtMQf-nfuB-wR9iew][inet[/107.170.x.y:9300]]{master=true}] marked shard as started, but shard have not been created, mark shard as failed

Log on the GL2:

<div sty ...

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5d62edc7-f55e-4cc8-b627-f5ec7472855a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/GhKnPvx1rHw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624aLL3DcaDB%2B6RwLkDxTC5GL%2BZaNJVG2LwrRNicUMrRRvw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6EDE7638-853D-42CC-8321-83EAC4732A31%40gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Walkom) #10

The more indexes you have the more heap you will use as it has to maintain
metadata in memory. This isn't a massive overhead but it all adds up.
Install a plugin like elastichq to monitor things, it'll give you a visual
insight into the status of the cluster and the nodes.

Also, upgrade. I mentioned this before but it's worth mentioning again.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 5 June 2014 17:37, Quan Tong Anh tonganhquan.net@gmail.com wrote:

Mark Walkom, what is the relation between index and heap size?
From the below numbers, how can I make sure that it’s really running out
of heap?

On Jun 5, 2014, at 12:41 PM, Mark Walkom markw@campaignmonitor.com
wrote:

In that case you're running out of heap. You need to increase your
existing heap (to a max of 50% system memory), or add more nodes, or add
more memory to your existing nodes, or delete some data.

You should really upgrade ES to, you will get a lot of benefits from newer
versions.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 5 June 2014 15:25, Quan Tong Anh tonganhquan.net@gmail.com wrote:

  • ElasticSearch version: 0.20.5
  • OS: Ubuntu 12.04
  • java version “1.7.0_55"
  • node specs: 2GB RAM, 2 cores 2400 MHz - QEMU Virtual CPU version 1.0
  • heap: -Xms750m -Xmx750m
  • index_total: 126920
  • docs_total: 30137401
  • average size: how can I find out this?
  • how are you loading data into the cluster: we use Graylog2 to insert
    logs in ES

elasticsearch_config_file = /etc/graylog2-elasticsearch.ymlelasticsearch_max_docs_per_index
= 2000000
elasticsearch_index_prefix = graylog2-graylog2elasticsearch_max_number_of_indices
= 5
elasticsearch_shards = 4elasticsearch_replicas = 0
elasticsearch_analyzer = standardrecent_index_ttl_minutes = 30
recent_index_store_type = niofsforce_syslog_rdns = false
allow_override_syslog_date = trueoutput_batch_size = 5000
processbuffer_processors = 5outputbuffer_processors = 5
processor_wait_strategy = blocking
ring_size = 1024

mongodb_useauth = false

mongodb_host = 127.0.0.1
mongodb_database = graylog2
mongodb_port = 27017
mongodb_max_connections = 100
mongodb_threads_allowed_to_block_multiplier = 5

use_gelf = true

On Thursday, June 5, 2014 10:55:56 AM UTC+7, Mark Walkom wrote:

Ok.
What version are you on, what OS, what java version (release and
number), what are your node specs (RAM, CPU, disk), how much heap are you
using, how many indexes do you have, how many documents are there, what is
the average size of the document, how are you loading data into the
cluster, what sort of queries are you running?

Help us help you by providing as much info a you can :slight_smile:

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 5 June 2014 13:46, Quan Tong Anh tonganh...@gmail.com wrote:

Full elasticsearch.yml:

bootstrap:
mlockall: true

cluster:
name: domain.com
routing:
allocation:
node_concurrent_recoveries: 2
http:
port: 9200
enabled: true

transport:
tcp:
port: 9300

node:
name: elasticsearch-1
master: true
data: true

discovery:
zen:
ping:
multicast:
enabled: false
unicast:
hosts: ["elasticsearch-1.domain.com:9300","logs.domain.com:9300"
,"elasticsearch-2.domain.com:9300",]

curl localhost:9200/_stats?pretty:

  "store" : {
    "size" : "24.6gb",
    "size_in_bytes" : 26521096680,
    "throttle_time" : "0s",
    "throttle_time_in_millis" : 0
  },
  "indexing" : {
    "index_total" : 126920,
    "index_time" : "2.7m",
    "index_time_in_millis" : 164595,
    "index_current" : 0,
    "delete_total" : 47005,
    "delete_time" : "3.5s",
    "delete_time_in_millis" : 3545,
    "delete_current" : 0
  },

On Thursday, June 5, 2014 10:04:04 AM UTC+7, Mark Walkom wrote:

  1. Depends
  2. See 1
  3. Add more nodes, more RAM or reduce your data set

For 1 and 2, you'll have to provide more info on your cluster setup,
size and use.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 5 June 2014 13:00, Quan Tong Anh tonganh...@gmail.com wrote:

I would like to know:

  • What is the root cause?
  • How do I fix that?
  • If it’s memory problem? Is there anything that I can do (except for
    upgrade)?

On Jun 5, 2014, at 9:54 AM, Mark Walkom ma...@campaignmonitor.com
wrote:

What do you want to know exactly?

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: ma...@campaignmonitor.com
web: www.campaignmonitor.com

On 5 June 2014 12:40, Quan Tong Anh tonganh...@gmail.com wrote:

I'm running a 3-node cluster with 2 data nodes. My configuration:

es1, es2:

node:
name: elasticsearch-1
master: true
data: true

discovery:
zen: ping:
multicast:
enabled: false
unicast:
hosts: ["elasticsearch-1.domain.com:9300","logs.domain.com:9300"
,"elasticsearch-2.domain.com:9300",]

gl2:

node:
name: graylog2
master: false
data: false

Shinken has sent me a notification that said there is only 2 nodes in
cluster:

{
"cluster_name" : "domain.com",
"status" : "red",
"timed_out" : false,
"number_of_nodes" : 2,
"number_of_data_nodes" : 1,
"active_primary_shards" : 12,
"active_shards" : 12,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 12
}

Log on the ES-1:

[2014-06-04 15:51:09,281][WARN ][transport ] [
elasticsearch-1] Received response for a request that has timed out,
sent [61627ms] ago, timed ou
t [30338ms] ago, action [discovery/zen/fd/masterPing], node [[
elasticsearch-2][Vcvb6dtMQf-nfuB-wR9iew][inet[/107.170.x.y:9300]]{master
=true}], id [272380]
[2014-06-04 15:51:50,542][WARN ][index.cache.field.data.resident] [
elasticsearch-1] [graylog2-graylog2_2] loading field [_date ] caused out
of memory failure
java.lang.OutOfMemoryError: Java heap space
[2014-06-04 15:55:16,351][DEBUG][action.admin.indices.stats] [
elasticsearch-1] [graylog2-graylog2_5][2], node[Vcvb6dtMQf-nfuB-wR9iew],
[P], s[STARTED]: Failed
to execute [org.elasticsearch.action.admin.indices.stats.IndicesStatsRe
quest@7631d2a2]
org.elasticsearch.transport.RemoteTransportException: [elasticsearch-2][
inet[/107.170.x.y:9300]][indices/stats/s]
Caused by: org.elasticsearch.index.IndexShardMissingException: [graylog2
-graylog2_5][2] missing
at org.elasticsearch.index.service.InternalIndexService.shardSa
fe(InternalIndexService.java:179)
at org.elasticsearch.action.admin.indices.stats.TransportIndice
sStatsAction.shardOperation(TransportIndicesStatsAction.java:145)
at org.elasticsearch.action.admin.indices.stats.TransportIndice
sStatsAction.shardOperation(TransportIndicesStatsAction.java:53)
at org.elasticsearch.action.support.broadcast.TransportBroadcas
tOperationAction$ShardTransportHandler.messageReceived(Trans
portBroadcastOperationAction.java:398)
at org.elasticsearch.action.support.broadcast.TransportBroadcas
tOperationAction$ShardTransportHandler.messageReceived(Trans
portBroadcastOperationAction.java:384)
at org.elasticsearch.transport.netty.MessageChannelHandler$Requ
estHandler.run(MessageChannelHandler.java:268)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
Executor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
lExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
[2014-06-04 15:56:29,504][WARN ][index.engine.robin ] [
elasticsearch-1] [graylog2_recent][0] failed engine
java.lang.OutOfMemoryError: Java heap space

Log on the ES-2:

[2014-06-04 15:51:02,276][WARN ][transport.netty ] [
elasticsearch-2] exception caught on transport layer [[id: 0x72906b9d, /107.170.z.t:52899
=> /1
07.170.x.y:9300]], closing connection
java.lang.OutOfMemoryError: Java heap space
at java.nio.DirectByteBuffer.duplicate(DirectByteBuffer.java:217
)
at org.elasticsearch.common.netty.channel.socket.nio.SocketSend
BufferPool.acquire(SocketSendBufferPool.java:87)
at org.elasticsearch.common.netty.channel.socket.nio.SocketSend
BufferPool.acquire(SocketSendBufferPool.java:46)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
oWorker.write0(AbstractNioWorker.java:190)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
oWorker.writeFromTaskLoop(AbstractNioWorker.java:150)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
oChannel$WriteTask.run(AbstractNioChannel.java:335)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
oSelector.processTaskQueue(AbstractNioSelector.java:366)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
oSelector.run(AbstractNioSelector.java:290)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNi
oWorker.run(AbstractNioWorker.java:88)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.
run(NioWorker.java:178)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.r
un(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWo
rker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPool
Executor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoo
lExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

[2014-06-04 15:51:27,143][WARN ][indices.cluster ] [
elasticsearch-2] [graylog2-graylog2_5][2] master [[elasticsearch-2][
Vcvb6dtMQf-nfuB-wR9iew][inet[/107.170.x.y:9300]]{master=true}] marked
shard as started, but shard have not been created, mark shard as failed

Log on the GL2:

<div sty

...

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/5d62edc7-f55e-4cc8-b627-f5ec7472855a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/5d62edc7-f55e-4cc8-b627-f5ec7472855a%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/GhKnPvx1rHw/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAEM624aLL3DcaDB%2B6RwLkDxTC5GL%2BZaNJVG2LwrRNicUMrRRvw%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAEM624aLL3DcaDB%2B6RwLkDxTC5GL%2BZaNJVG2LwrRNicUMrRRvw%40mail.gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6EDE7638-853D-42CC-8321-83EAC4732A31%40gmail.com
https://groups.google.com/d/msgid/elasticsearch/6EDE7638-853D-42CC-8321-83EAC4732A31%40gmail.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624ZyOF25iK%2BFEFD46Lofrzqke51BbZOwCJM%3Dqwsj7xsm1Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(system) #11