Newbie - memory issues

Hi all,

This is my first ES production setup and am running into some memory
related issues. In short, ES starts using all the available RAM, then all
the swap then gets killed (monit will check how much memory it uses and
kills it, or it would need a slice restart, it depends how fast monit
catches that...). Attaching the charts with the memory, swap and CPU usage
for the past 30 days.

The thing is, there's nothing on the server yet, as in zero documents:

$ curl localhost:9200/_count
{"count":0,"_shards":{"total":0,"successful":0,"failed":0}}

and no activity (other than the ES to ES chatter, this is a 4 nodes
cluster).

The setup is as follows: 4 nodes (2 routing only nodes: es1 and es2 and 2
data/master nodes: es3 and es4).
The routing nodes have identical config and also the data nodes have
identical config. Of them all, only es3 has the memory issues, the others
never needed restarting since they were setup (have uptime 100+ days).
Data nodes are using 4GB/4VCPUs slices.

We use the latest ES (1.1.1) and the latest Java:

$ java -version
java version "1.7.0_55"
Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)

(we also used a 1.x version before this, but it was doing the same thing.
Can't remember if we used 1.1.0 or 1.0.x before this last one)

Configuration looks like:
es3 config:
script.disable_dynamic: true
cluster.name: ...
node.name: "es3"
node.master: true
node.data: true
node.max_local_storage_nodes: 1
path.conf: /home/elasticsearch/config
path.data: /home/elasticsearch/data
path.work: /tmp/elasticsearch
path.logs: /var/log/elasticsearch
bootstrap.mlockall: true
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: [...]

and ES is started with: sudo -u elasticsearch ES_INCLUDE=/etc/elasticsearch
/usr/local/elasticsearch/bin/elasticsearch -p
/home/elasticsearch/run/elasticsearch.pid
where /etc/elasticsearch contains:

ES_HOME=/usr/local/elasticsearch
ES_CLASSPATH=$ES_CLASSPATH:$ES_HOME/lib/:$ES_HOME/lib/sigar/
ES_HEAP_SIZE=2g
ES_JAVA_OPTS="
-server
-Djava.net.preferIPv4Stack=true
-Des.config=/home/elasticsearch/config/elasticsearch.yml
-Xms$ES_HEAP_SIZE
-Xmx$ES_HEAP_SIZE"

and JVM stats look like:
$ curl localhost:9200/_nodes/es3/stats/jvm?pretty
{
"cluster_name" : "medivo",
"nodes" : {
"HpjUsMklQJeOAixUblwb6g" : {
"timestamp" : 1400001198394,
"name" : "es3",
"attributes" : {
"max_local_storage_nodes" : "1",
"master" : "true"
},
"jvm" : {
"timestamp" : 1400001198394,
"uptime_in_millis" : 225671001,
"mem" : {
"heap_used_in_bytes" : 236967272,
"heap_used_percent" : 11,
"heap_committed_in_bytes" : 2139095040,
"heap_max_in_bytes" : 2139095040,
"non_heap_used_in_bytes" : 29450648,
"non_heap_committed_in_bytes" : 30867456,
"pools" : {
"young" : {
"used_in_bytes" : 134064400,
"max_in_bytes" : 699924480,
"peak_used_in_bytes" : 698875904,
"peak_max_in_bytes" : 699924480
},
"survivor" : {
"used_in_bytes" : 3866624,
"max_in_bytes" : 8388608,
"peak_used_in_bytes" : 29209928,
"peak_max_in_bytes" : 89128960
},
"old" : {
"used_in_bytes" : 99036248,
"max_in_bytes" : 1431830528,
"peak_used_in_bytes" : 99036248,
"peak_max_in_bytes" : 1431830528
}
}
},
"threads" : {
"count" : 15130,
"peak_count" : 15130
},
"gc" : {
"collectors" : {
"young" : {
"collection_count" : 33,
"collection_time_in_millis" : 1168
},
"old" : {
"collection_count" : 0,
"collection_time_in_millis" : 0
}
}
},
"buffer_pools" : {
"direct" : {
"count" : 48,
"used_in_bytes" : 14155776,
"total_capacity_in_bytes" : 14155776
},
"mapped" : {
"count" : 0,
"used_in_bytes" : 0,
"total_capacity_in_bytes" : 0
}
}
}
}
}
}

When it starts using memory like that, it also fills the logfile with
errors like:
[2014-05-11 01:43:20,090][WARN ][http.netty ] [es3] Caught
exception while handling client http traffic, closing connection [id:
0xba7f4f7c, /127.0.0.1:37820 => /127.0.0.1:9200]
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.addWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
at
org.elasticsearch.action.support.TransportAction$ThreadedActionListener.onFailure(TransportAction.java:114)
at
org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:66)
at
org.elasticsearch.action.support.master.TransportMasterNodeOperationAction.execute(TransportMasterNodeOperationAction.java:89)
at
org.elasticsearch.action.support.master.TransportMasterNodeOperationAction.execute(TransportMasterNodeOperationAction.java:42)
at
org.elasticsearch.client.node.NodeClusterAdminClient.execute(NodeClusterAdminClient.java:72)
at
org.elasticsearch.client.support.AbstractClusterAdminClient.state(AbstractClusterAdminClient.java:138)
at
org.elasticsearch.rest.action.main.RestMainAction.handleRequest(RestMainAction.java:62)
at
org.elasticsearch.rest.RestController.executeHandler(RestController.java:159)
at
org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:142)
at
org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:121)
at
org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83)
at
org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:291)
at
org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:43)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at
org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
at
org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
at
org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at
org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)

Any tips/pointers into how to debug this further would be greatly
appreciated. Could this be a configuration error (what?), something else,
etc.?

Thank you very much,

Sincerely,
Alex Ungur

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b0e4350d-762c-4c38-9010-8f1108209583%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Your thread count on es3 is ridiculous high.

Hint: check your network settings and routing if the es3 node can connect
to all others. I assume es3 is caught in a loop desperatly trying to
reconnect to something and consuming the hell out of threads... can you
take a thread snapshot of es3?

Do you use any plugins?

Jörg

On Tue, May 13, 2014 at 7:52 PM, Alexandru Ungur alexaandru@gmail.comwrote:

Hi all,

This is my first ES production setup and am running into some memory
related issues. In short, ES starts using all the available RAM, then all
the swap then gets killed (monit will check how much memory it uses and
kills it, or it would need a slice restart, it depends how fast monit
catches that...). Attaching the charts with the memory, swap and CPU usage
for the past 30 days.

The thing is, there's nothing on the server yet, as in zero documents:

$ curl localhost:9200/_count
{"count":0,"_shards":{"total":0,"successful":0,"failed":0}}

and no activity (other than the ES to ES chatter, this is a 4 nodes
cluster).

The setup is as follows: 4 nodes (2 routing only nodes: es1 and es2 and 2
data/master nodes: es3 and es4).
The routing nodes have identical config and also the data nodes have
identical config. Of them all, only es3 has the memory issues, the others
never needed restarting since they were setup (have uptime 100+ days).
Data nodes are using 4GB/4VCPUs slices.

We use the latest ES (1.1.1) and the latest Java:

$ java -version
java version "1.7.0_55"
Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)

(we also used a 1.x version before this, but it was doing the same thing.
Can't remember if we used 1.1.0 or 1.0.x before this last one)

Configuration looks like:
es3 config:
script.disable_dynamic: true
cluster.name: ...
node.name: "es3"
node.master: true
node.data: true
node.max_local_storage_nodes: 1
path.conf: /home/elasticsearch/config
path.data: /home/elasticsearch/data
path.work: /tmp/elasticsearch
path.logs: /var/log/elasticsearch
bootstrap.mlockall: true
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: [...]

and ES is started with: sudo -u elasticsearch
ES_INCLUDE=/etc/elasticsearch /usr/local/elasticsearch/bin/elasticsearch -p
/home/elasticsearch/run/elasticsearch.pid
where /etc/elasticsearch contains:

ES_HOME=/usr/local/elasticsearch
ES_CLASSPATH=$ES_CLASSPATH:$ES_HOME/lib/:$ES_HOME/lib/sigar/
ES_HEAP_SIZE=2g
ES_JAVA_OPTS="
-server
-Djava.net.preferIPv4Stack=true
-Des.config=/home/elasticsearch/config/elasticsearch.yml
-Xms$ES_HEAP_SIZE
-Xmx$ES_HEAP_SIZE"

and JVM stats look like:
$ curl localhost:9200/_nodes/es3/stats/jvm?pretty
{
"cluster_name" : "medivo",
"nodes" : {
"HpjUsMklQJeOAixUblwb6g" : {
"timestamp" : 1400001198394,
"name" : "es3",
"attributes" : {
"max_local_storage_nodes" : "1",
"master" : "true"
},
"jvm" : {
"timestamp" : 1400001198394,
"uptime_in_millis" : 225671001,
"mem" : {
"heap_used_in_bytes" : 236967272,
"heap_used_percent" : 11,
"heap_committed_in_bytes" : 2139095040,
"heap_max_in_bytes" : 2139095040,
"non_heap_used_in_bytes" : 29450648,
"non_heap_committed_in_bytes" : 30867456,
"pools" : {
"young" : {
"used_in_bytes" : 134064400,
"max_in_bytes" : 699924480,
"peak_used_in_bytes" : 698875904,
"peak_max_in_bytes" : 699924480
},
"survivor" : {
"used_in_bytes" : 3866624,
"max_in_bytes" : 8388608,
"peak_used_in_bytes" : 29209928,
"peak_max_in_bytes" : 89128960
},
"old" : {
"used_in_bytes" : 99036248,
"max_in_bytes" : 1431830528,
"peak_used_in_bytes" : 99036248,
"peak_max_in_bytes" : 1431830528
}
}
},
"threads" : {
"count" : 15130,
"peak_count" : 15130
},
"gc" : {
"collectors" : {
"young" : {
"collection_count" : 33,
"collection_time_in_millis" : 1168
},
"old" : {
"collection_count" : 0,
"collection_time_in_millis" : 0
}
}
},
"buffer_pools" : {
"direct" : {
"count" : 48,
"used_in_bytes" : 14155776,
"total_capacity_in_bytes" : 14155776
},
"mapped" : {
"count" : 0,
"used_in_bytes" : 0,
"total_capacity_in_bytes" : 0
}
}
}
}
}
}

When it starts using memory like that, it also fills the logfile with
errors like:
[2014-05-11 01:43:20,090][WARN ][http.netty ] [es3] Caught
exception while handling client http traffic, closing connection [id:
0xba7f4f7c, /127.0.0.1:37820 => /127.0.0.1:9200]
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.addWorker(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
at
org.elasticsearch.action.support.TransportAction$ThreadedActionListener.onFailure(TransportAction.java:114)
at
org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:66)
at
org.elasticsearch.action.support.master.TransportMasterNodeOperationAction.execute(TransportMasterNodeOperationAction.java:89)
at
org.elasticsearch.action.support.master.TransportMasterNodeOperationAction.execute(TransportMasterNodeOperationAction.java:42)
at
org.elasticsearch.client.node.NodeClusterAdminClient.execute(NodeClusterAdminClient.java:72)
at
org.elasticsearch.client.support.AbstractClusterAdminClient.state(AbstractClusterAdminClient.java:138)
at
org.elasticsearch.rest.action.main.RestMainAction.handleRequest(RestMainAction.java:62)
at
org.elasticsearch.rest.RestController.executeHandler(RestController.java:159)
at
org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:142)
at
org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:121)
at
org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83)
at
org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:291)
at
org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:43)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at
org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
at
org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
at
org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at
org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)

Any tips/pointers into how to debug this further would be greatly
appreciated. Could this be a configuration error (what?), something else,
etc.?

Thank you very much,

Sincerely,
Alex Ungur

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b0e4350d-762c-4c38-9010-8f1108209583%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/b0e4350d-762c-4c38-9010-8f1108209583%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG1BuzJ4ZPTg6jK38%2BSqocMLowtZUmAphuQpe%2Bbk0-JtQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Thank you so much! :slight_smile:

Indeed, on all the others is like 20-40 max (at peak), but on this one is
just growing and growing (past 16k now).
We do use firewalls on each slice to protect ES from outside access, but
they are configured identically, to allow each other access. Just verified
and I can access each of them on port 9200 from any of them (including
es3). Right now, the error seems to be:

[2014-05-13 19:15:23,039][WARN ][transport.netty ] [es3] exception
caught on transport layer [[id: 0xbfa27705, /10.xxx.zzz.92:47617 :>
/10.xxx.zzz.92:9300]], closing connection
java.io.StreamCorruptedException: invalid internal transport message format
at
org.elasticsearch.transport.netty.SizeHeaderFrameDecoder.decode(SizeHeaderFrameDecoder.java:46)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:425)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.cleanup(FrameDecoder.java:482)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.channelDisconnected(FrameDecoder.java:365)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:102)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at
org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at
org.elasticsearch.common.netty.channel.Channels.fireChannelDisconnected(Channels.java:396)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.close(AbstractNioWorker.java:360)
at
org.elasticsearch.common.netty.channel.socket.nio.NioServerSocketPipelineSink.handleAcceptedSocket(NioServerSocketPipelineSink.java:81)
at
org.elasticsearch.common.netty.channel.socket.nio.NioServerSocketPipelineSink.eventSunk(NioServerSocketPipelineSink.java:36)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendDownstream(DefaultChannelPipeline.java:574)
at org.elasticsearch.common.netty.channel.Channels.close(Channels.java:812)
at
org.elasticsearch.common.netty.channel.AbstractChannel.close(AbstractChannel.java:197)
at
org.elasticsearch.transport.netty.NettyTransport.exceptionCaught(NettyTransport.java:523)
at
org.elasticsearch.transport.netty.MessageChannelHandler.exceptionCaught(MessageChannelHandler.java:229)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.exceptionCaught(FrameDecoder.java:377)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:112)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at
org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at
org.elasticsearch.common.netty.channel.Channels.fireExceptionCaught(Channels.java:525)
at
org.elasticsearch.common.netty.channel.AbstractChannelSink.exceptionCaught(AbstractChannelSink.java:48)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.notifyHandlerException(DefaultChannelPipeline.java:658)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:566)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at
org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)

where that is it's own IP address. So that ES is failing to connect to its
own port 9300? (or rather failing to get a meaningful answer from it).

We use no plugins.

Threads snapshot: "ps -eLf" gives 16347 lines all looking like these:

UID PID PPID LWP C NLWP STIME TTY TIME CMD
1001 4632 4631 27449 0 16346 19:44 ? 00:00:00
/usr/lib/jvm/jre1.7.0_55/bin/java -server -Djava.net.preferIPv4Stack=true
-Des.config=/home/elasticsearch/config/elasticsearch.yml -Xms2g -Xmx2g
-Delasticsearch -Des.pidfile=/home/elasticsearch/run/elasticsearch.pid
-Des.foreground=yes -Des.path.home=/usr/local/elasticsearch -cp
:/usr/local/elasticsearch/lib/:/usr/local/elasticsearch/lib/sigar/
org.elasticsearch.bootstrap.Elasticsearch
1001 4632 4631 27475 0 16346 19:44 ? 00:00:00
/usr/lib/jvm/jre1.7.0_55/bin/java -server -Djava.net.preferIPv4Stack=true
-Des.config=/home/elasticsearch/config/elasticsearch.yml -Xms2g -Xmx2g
-Delasticsearch -Des.pidfile=/home/elasticsearch/run/elasticsearch.pid
-Des.foreground=yes -Des.path.home=/usr/local/elasticsearch -cp
:/usr/local/elasticsearch/lib/:/usr/local/elasticsearch/lib/sigar/
org.elasticsearch.bootstrap.Elasticsearch
1001 4632 4631 27478 0 16346 19:44 ? 00:00:00
/usr/lib/jvm/jre1.7.0_55/bin/java -server -Djava.net.preferIPv4Stack=true
-Des.config=/home/elasticsearch/config/elasticsearch.yml -Xms2g -Xmx2g
-Delasticsearch -Des.pidfile=/home/elasticsearch/run/elasticsearch.pid
-Des.foreground=yes -Des.path.home=/usr/local/elasticsearch -cp
:/usr/local/elasticsearch/lib/:/usr/local/elasticsearch/lib/sigar/
org.elasticsearch.bootstrap.Elasticsearch
1001 4632 4631 27505 0 16346 19:44 ? 00:00:00
/usr/lib/jvm/jre1.7.0_55/bin/java -server -Djava.net.preferIPv4Stack=true
-Des.config=/home/elasticsearch/config/elasticsearch.yml -Xms2g -Xmx2g
-Delasticsearch -Des.pidfile=/home/elasticsearch/run/elasticsearch.pid
-Des.foreground=yes -Des.path.home=/usr/local/elasticsearch -cp
:/usr/local/elasticsearch/lib/:/usr/local/elasticsearch/lib/sigar/
org.elasticsearch.bootstrap.Elasticsearch
1001 4632 4631 27509 0 16346 19:44 ? 00:00:00
/usr/lib/jvm/jre1.7.0_55/bin/java -server -Djava.net.preferIPv4Stack=true
-Des.config=/home/elasticsearch/config/elasticsearch.yml -Xms2g -Xmx2g
-Delasticsearch -Des.pidfile=/home/elasticsearch/run/elasticsearch.pid
-Des.foreground=yes -Des.path.home=/usr/local/elasticsearch -cp
:/usr/local/elasticsearch/lib/:/usr/local/elasticsearch/lib/sigar/
org.elasticsearch.bootstrap.Elasticsearch
1001 4632 4631 27513 0 16346 19:44 ? 00:00:00
/usr/lib/jvm/jre1.7.0_55/bin/java -server -Djava.net.preferIPv4Stack=true
-Des.config=/home/elasticsearch/config/elasticsearch.yml -Xms2g -Xmx2g
-Delasticsearch -Des.pidfile=/home/elasticsearch/run/elasticsearch.pid
-Des.foreground=yes -Des.path.home=/usr/local/elasticsearch -cp
:/usr/local/elasticsearch/lib/:/usr/local/elasticsearch/lib/sigar/
org.elasticsearch.bootstrap.Elasticsearch

Thanks again! :slight_smile:
Alex

marți, 13 mai 2014, 21:30:14 UTC+3, Jörg Prante a scris:

Your thread count on es3 is ridiculous high.

Hint: check your network settings and routing if the es3 node can connect
to all others. I assume es3 is caught in a loop desperatly trying to
reconnect to something and consuming the hell out of threads... can you
take a thread snapshot of es3?

Do you use any plugins?

Jörg

On Tue, May 13, 2014 at 7:52 PM, Alexandru Ungur <alexa...@gmail.com<javascript:>

wrote:

Hi all,

This is my first ES production setup and am running into some memory
related issues. In short, ES starts using all the available RAM, then all
the swap then gets killed (monit will check how much memory it uses and
kills it, or it would need a slice restart, it depends how fast monit
catches that...). Attaching the charts with the memory, swap and CPU usage
for the past 30 days.

The thing is, there's nothing on the server yet, as in zero documents:

$ curl localhost:9200/_count
{"count":0,"_shards":{"total":0,"successful":0,"failed":0}}

and no activity (other than the ES to ES chatter, this is a 4 nodes
cluster).

The setup is as follows: 4 nodes (2 routing only nodes: es1 and es2 and 2
data/master nodes: es3 and es4).
The routing nodes have identical config and also the data nodes have
identical config. Of them all, only es3 has the memory issues, the others
never needed restarting since they were setup (have uptime 100+ days).
Data nodes are using 4GB/4VCPUs slices.

We use the latest ES (1.1.1) and the latest Java:

$ java -version
java version "1.7.0_55"
Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)

(we also used a 1.x version before this, but it was doing the same thing.
Can't remember if we used 1.1.0 or 1.0.x before this last one)

Configuration looks like:
es3 config:
script.disable_dynamic: true
cluster.name: ...
node.name: "es3"
node.master: true
node.data: true
node.max_local_storage_nodes: 1
path.conf: /home/elasticsearch/config
path.data: /home/elasticsearch/data
path.work: /tmp/elasticsearch
path.logs: /var/log/elasticsearch
bootstrap.mlockall: true
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: [...]

and ES is started with: sudo -u elasticsearch
ES_INCLUDE=/etc/elasticsearch /usr/local/elasticsearch/bin/elasticsearch -p
/home/elasticsearch/run/elasticsearch.pid
where /etc/elasticsearch contains:

ES_HOME=/usr/local/elasticsearch
ES_CLASSPATH=$ES_CLASSPATH:$ES_HOME/lib/:$ES_HOME/lib/sigar/
ES_HEAP_SIZE=2g
ES_JAVA_OPTS="
-server
-Djava.net.preferIPv4Stack=true
-Des.config=/home/elasticsearch/config/elasticsearch.yml
-Xms$ES_HEAP_SIZE
-Xmx$ES_HEAP_SIZE"

and JVM stats look like:
$ curl localhost:9200/_nodes/es3/stats/jvm?pretty
{
"cluster_name" : "medivo",
"nodes" : {
"HpjUsMklQJeOAixUblwb6g" : {
"timestamp" : 1400001198394,
"name" : "es3",
"attributes" : {
"max_local_storage_nodes" : "1",
"master" : "true"
},
"jvm" : {
"timestamp" : 1400001198394,
"uptime_in_millis" : 225671001,
"mem" : {
"heap_used_in_bytes" : 236967272,
"heap_used_percent" : 11,
"heap_committed_in_bytes" : 2139095040,
"heap_max_in_bytes" : 2139095040,
"non_heap_used_in_bytes" : 29450648,
"non_heap_committed_in_bytes" : 30867456,
"pools" : {
"young" : {
"used_in_bytes" : 134064400,
"max_in_bytes" : 699924480,
"peak_used_in_bytes" : 698875904,
"peak_max_in_bytes" : 699924480
},
"survivor" : {
"used_in_bytes" : 3866624,
"max_in_bytes" : 8388608,
"peak_used_in_bytes" : 29209928,
"peak_max_in_bytes" : 89128960
},
"old" : {
"used_in_bytes" : 99036248,
"max_in_bytes" : 1431830528,
"peak_used_in_bytes" : 99036248,
"peak_max_in_bytes" : 1431830528
}
}
},
"threads" : {
"count" : 15130,
"peak_count" : 15130
},
"gc" : {
"collectors" : {
"young" : {
"collection_count" : 33,
"collection_time_in_millis" : 1168
},
"old" : {
"collection_count" : 0,
"collection_time_in_millis" : 0
}
}
},
"buffer_pools" : {
"direct" : {
"count" : 48,
"used_in_bytes" : 14155776,
"total_capacity_in_bytes" : 14155776
},
"mapped" : {
"count" : 0,
"used_in_bytes" : 0,
"total_capacity_in_bytes" : 0
}
}
}
}
}
}

When it starts using memory like that, it also fills the logfile with
errors like:
[2014-05-11 01:43:20,090][WARN ][http.netty ] [es3] Caught
exception while handling client http traffic, closing connection [id:
0xba7f4f7c, /127.0.0.1:37820 => /127.0.0.1:9200]
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.addWorker(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
at
org.elasticsearch.action.support.TransportAction$ThreadedActionListener.onFailure(TransportAction.java:114)
at
org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:66)
at
org.elasticsearch.action.support.master.TransportMasterNodeOperationAction.execute(TransportMasterNodeOperationAction.java:89)
at
org.elasticsearch.action.support.master.TransportMasterNodeOperationAction.execute(TransportMasterNodeOperationAction.java:42)
at
org.elasticsearch.client.node.NodeClusterAdminClient.execute(NodeClusterAdminClient.java:72)
at
org.elasticsearch.client.support.AbstractClusterAdminClient.state(AbstractClusterAdminClient.java:138)
at
org.elasticsearch.rest.action.main.RestMainAction.handleRequest(RestMainAction.java:62)
at
org.elasticsearch.rest.RestController.executeHandler(RestController.java:159)
at
org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:142)
at
org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:121)
at
org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83)
at
org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:291)
at
org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:43)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at
org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
at
org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
at
org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at
org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown
Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)

Any tips/pointers into how to debug this further would be greatly
appreciated. Could this be a configuration error (what?), something else,
etc.?

Thank you very much,

Sincerely,
Alex Ungur

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b0e4350d-762c-4c38-9010-8f1108209583%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/b0e4350d-762c-4c38-9010-8f1108209583%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5b3426c6-ab23-4c28-b6fd-ed362cb1dd10%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.