Hi all,
This is my first ES production setup and am running into some memory
related issues. In short, ES starts using all the available RAM, then all
the swap then gets killed (monit will check how much memory it uses and
kills it, or it would need a slice restart, it depends how fast monit
catches that...). Attaching the charts with the memory, swap and CPU usage
for the past 30 days.
The thing is, there's nothing on the server yet, as in zero documents:
$ curl localhost:9200/_count
{"count":0,"_shards":{"total":0,"successful":0,"failed":0}}
and no activity (other than the ES to ES chatter, this is a 4 nodes
cluster).
The setup is as follows: 4 nodes (2 routing only nodes: es1 and es2 and 2
data/master nodes: es3 and es4).
The routing nodes have identical config and also the data nodes have
identical config. Of them all, only es3 has the memory issues, the others
never needed restarting since they were setup (have uptime 100+ days).
Data nodes are using 4GB/4VCPUs slices.
We use the latest ES (1.1.1) and the latest Java:
$ java -version
java version "1.7.0_55"
Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)
(we also used a 1.x version before this, but it was doing the same thing.
Can't remember if we used 1.1.0 or 1.0.x before this last one)
Configuration looks like:
es3 config:
script.disable_dynamic: true
cluster.name: ...
node.name: "es3"
node.master: true
node.data: true
node.max_local_storage_nodes: 1
path.conf: /home/elasticsearch/config
path.data: /home/elasticsearch/data
path.work: /tmp/elasticsearch
path.logs: /var/log/elasticsearch
bootstrap.mlockall: true
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: [...]
and ES is started with: sudo -u elasticsearch ES_INCLUDE=/etc/elasticsearch
/usr/local/elasticsearch/bin/elasticsearch -p
/home/elasticsearch/run/elasticsearch.pid
where /etc/elasticsearch contains:
ES_HOME=/usr/local/elasticsearch
ES_CLASSPATH=$ES_CLASSPATH:$ES_HOME/lib/:$ES_HOME/lib/sigar/
ES_HEAP_SIZE=2g
ES_JAVA_OPTS="
-server
-Djava.net.preferIPv4Stack=true
-Des.config=/home/elasticsearch/config/elasticsearch.yml
-Xms$ES_HEAP_SIZE
-Xmx$ES_HEAP_SIZE"
and JVM stats look like:
$ curl localhost:9200/_nodes/es3/stats/jvm?pretty
{
"cluster_name" : "medivo",
"nodes" : {
"HpjUsMklQJeOAixUblwb6g" : {
"timestamp" : 1400001198394,
"name" : "es3",
"attributes" : {
"max_local_storage_nodes" : "1",
"master" : "true"
},
"jvm" : {
"timestamp" : 1400001198394,
"uptime_in_millis" : 225671001,
"mem" : {
"heap_used_in_bytes" : 236967272,
"heap_used_percent" : 11,
"heap_committed_in_bytes" : 2139095040,
"heap_max_in_bytes" : 2139095040,
"non_heap_used_in_bytes" : 29450648,
"non_heap_committed_in_bytes" : 30867456,
"pools" : {
"young" : {
"used_in_bytes" : 134064400,
"max_in_bytes" : 699924480,
"peak_used_in_bytes" : 698875904,
"peak_max_in_bytes" : 699924480
},
"survivor" : {
"used_in_bytes" : 3866624,
"max_in_bytes" : 8388608,
"peak_used_in_bytes" : 29209928,
"peak_max_in_bytes" : 89128960
},
"old" : {
"used_in_bytes" : 99036248,
"max_in_bytes" : 1431830528,
"peak_used_in_bytes" : 99036248,
"peak_max_in_bytes" : 1431830528
}
}
},
"threads" : {
"count" : 15130,
"peak_count" : 15130
},
"gc" : {
"collectors" : {
"young" : {
"collection_count" : 33,
"collection_time_in_millis" : 1168
},
"old" : {
"collection_count" : 0,
"collection_time_in_millis" : 0
}
}
},
"buffer_pools" : {
"direct" : {
"count" : 48,
"used_in_bytes" : 14155776,
"total_capacity_in_bytes" : 14155776
},
"mapped" : {
"count" : 0,
"used_in_bytes" : 0,
"total_capacity_in_bytes" : 0
}
}
}
}
}
}
When it starts using memory like that, it also fills the logfile with
errors like:
[2014-05-11 01:43:20,090][WARN ][http.netty ] [es3] Caught
exception while handling client http traffic, closing connection [id:
0xba7f4f7c, /127.0.0.1:37820 => /127.0.0.1:9200]
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.addWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor.execute(Unknown Source)
at
org.elasticsearch.action.support.TransportAction$ThreadedActionListener.onFailure(TransportAction.java:114)
at
org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:66)
at
org.elasticsearch.action.support.master.TransportMasterNodeOperationAction.execute(TransportMasterNodeOperationAction.java:89)
at
org.elasticsearch.action.support.master.TransportMasterNodeOperationAction.execute(TransportMasterNodeOperationAction.java:42)
at
org.elasticsearch.client.node.NodeClusterAdminClient.execute(NodeClusterAdminClient.java:72)
at
org.elasticsearch.client.support.AbstractClusterAdminClient.state(AbstractClusterAdminClient.java:138)
at
org.elasticsearch.rest.action.main.RestMainAction.handleRequest(RestMainAction.java:62)
at
org.elasticsearch.rest.RestController.executeHandler(RestController.java:159)
at
org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:142)
at
org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:121)
at
org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83)
at
org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:291)
at
org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:43)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at
org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
at
org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
at
org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at
org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown
Source)
at java.lang.Thread.run(Unknown Source)
Any tips/pointers into how to debug this further would be greatly
appreciated. Could this be a configuration error (what?), something else,
etc.?
Thank you very much,
Sincerely,
Alex Ungur
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b0e4350d-762c-4c38-9010-8f1108209583%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.