It's in index statistics under memory row.
--
David Pilato | Technical Advocate | Elasticsearch.com
@dadoonet | @elasticsearchfr
Le 30 mai 2014 à 09:31:33, Jorge Ferrando (jorfermo@gmail.com) a écrit:
I don't have older metrics on Marvel. I turned it on few days ago to see if it could help with solving the problem
I couldn't find Field data memory in node statistics. Where can I find it?
On Thu, May 29, 2014 at 3:40 PM, David Pilato david@pilato.fr wrote:
What gives older Marvel metrics?
What does the field data memory looks like?
--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 29 mai 2014 à 13:53, Jorge Ferrando jorfermo@gmail.com a écrit :
There are recent entries in the log (like 15 mins ago) about gc/young/old
[2014-05-29 13:37:34,183][INFO ][monitor.jvm ] [elastic ASIC nodo 3] [gc][young][38][5] duration [763ms], collections [1]/[1s], total [763ms]/[2.3s], memory [609.6mb]->[166.3mb]/[29.9gb], all_pools {[young] [528.7mb]->[29.8mb]/[532.5mb]}{[survivor] [64.3mb]->[66.5mb]/[66.5mb]}{[old] [16.5mb]->[69.9mb]/[29.3gb]}
[2014-05-29 13:51:17,798][INFO ][monitor.jvm ] [elastic ASIC nodo 3] [gc][young][846][205] duration [727ms], collections [1]/[1.6s], total [727ms]/[1.1m], memory [4.2gb]->[4.2gb]/[29.9gb], all_pools {[young] [11.3mb]->[8.2mb]/[532.5mb]}{[survivor] [66.5mb]->[51.1mb]/[66.5mb]}{[old] [4.2gb]->[4.2gb]/[29.3gb]}
On Thu, May 29, 2014 at 1:51 PM, Jorge Ferrando jorfermo@gmail.com wrote:
What could be the cause of that? Any update of elasticsearch? Any configuration parameter? What should I look for in the logs?
On Thu, May 29, 2014 at 10:51 AM, David Pilato david@pilato.fr wrote:
I think but might be wrong that this node as unresponsive does not collect anymore GC data.
May be you could look in the past before things starting to be worse.
--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 29 mai 2014 à 10:43, Jorge Ferrando jorfermo@gmail.com a écrit :
This is what Marvel shows for old GC in the last 6 hours for that node:
<image.png>
On Thu, May 29, 2014 at 10:39 AM, David Pilato david@pilato.fr wrote:
It sounds like the old GC is not able to clean old gen space enough.
I guess that if you look at your Marvel dashboards, you can see that on old GC.
So memory pressure is the first guess. You may have too many old GC cycles.
--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 29 mai 2014 à 10:32, Jorge Ferrando jorfermo@gmail.com a écrit :
Thanks for the answer David
I added this setting to elasticsearch.yml some days ago to see if that what's the problem:
discovery.zen.ping.timeout: 5s
discovery.zen.fd.ping_interval: 5s
discovery.zen.fd.ping_timeout: 60s
discovery.zen.fd.ping_retries: 3
If I'm not mistaken, with those settings the node should be marked as unavailable after 3m and most of the times it happens quicker. Am I wrong?
On Thu, May 29, 2014 at 10:29 AM, David Pilato david@pilato.fr wrote:
GC took too much time so your node become unresponsive I think.
If you set 30 Gb RAM, you should increase the time out ping setting before a node is marked as unresponsive.
And if you are under memory pressure, you could try to check your requests and see if you can have some optimization or start new nodes...
My 2 cents.
--
David
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs
Le 29 mai 2014 à 09:56, Jorge Ferrando jorfermo@gmail.com a écrit :
I've been analyzing the problem with Marvel and nagios and I managed to get 2 more details:
- The node restarting/reinitializing it's always the same. Node 3
- It always happens quickly after getting the cluster in green state. Between some seconds and 2-3 minutes
I have debug mode on in logging.yml:
logger:
log action execution errors for easier debugging
action: DEBUG
But i dont see anything in the log. For instance, this is the last time it happened at around 9:47 the cluster became green and 9:50 the node restarted
[2014-05-29 09:30:57,235][INFO ][monitor.jvm ] [elastic ASIC nodo 3] [gc][young][129][20] duration [745ms], collections [1]/[1s], total [745ms]/[8.5s], memory [951.1mb]->[598.9mb]/[29.9gb], all_pools {[young] [421.5mb]->[8.2mb]/[532.5mb]}{[survivor] [66.5mb]->[66.5mb]/[66.5mb]}{[old] [463.1mb]->[524.1mb]/[29.3gb]}
[2014-05-29 09:45:36,322][WARN ][monitor.jvm ] [elastic ASIC nodo 3] [gc][old][964][1] duration [29.5s], collections [1]/[30.4s], total [29.5s]/[29.5s], memory [5.1gb]->[4.3gb]/[29.9gb], all_pools {[young] [29.4mb]->[34.9mb]/[532.5mb]}{[survivor] [59.9mb]->[0b]/[66.5mb]}{[old] [5gb]->[4.2gb]/[29.3gb]}
[2014-05-29 09:50:41,040][INFO ][node ] [elastic ASIC nodo 3] version[1.2.0], pid[7021], build[c82387f/2014-05-22T12:49:13Z]
[2014-05-29 09:50:41,041][INFO ][node ] [elastic ASIC nodo 3] initializing ...
[2014-05-29 09:50:41,063][INFO ][plugins ] [elastic ASIC nodo 3] loaded [marvel], sites [marvel, paramedic, inquisitor, HQ, bigdesk, head]
[2014-05-29 09:50:47,908][INFO ][node ] [elastic ASIC nodo 3] initialized
[2014-05-29 09:50:47,909][INFO ][node ] [elastic ASIC nodo 3] starting ...
¿Is there any other way of debugging what's going on with that node?
On Tue, May 27, 2014 at 12:49 PM, Jorge Ferrando jorfermo@gmail.com wrote:
I thought about that but It would be strange because they are 3 Virtual Machines in the same VMWare cluster with other hundreds of services and nobody reported any networking problem.
On Thu, May 22, 2014 at 3:16 PM, emeschitc emeschitc@gmail.com wrote:
Hi,
I may be wrong but it seems to me you have a problem with your network. It may be a flaky connection, broken nic or something wrong with your configuration for discovery and/or data transport ?
Caused by: org.elasticsearch.transport.NodeNotConnectedException: [elastic ASIC nodo 2][inet[/158.42.250.79:9301]] Node not connected
at org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:859)
at org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:540)
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189)
Check the status of the network on this node.
On Thu, May 22, 2014 at 2:07 PM, Jorge Ferrando [via ElasticSearch Users] <[hidden email]> wrote:
Hello
We have a cluster of 3 nodes running Ubuntu 12.04.4 LTS 64bits, and elasticsearch v1.1.1
It's be running flawlessly but since the last weak some of the nodes restarts randomly and cluster gets to red state, then yellow, then green and it happens again in a loop (sometimes it even doesnt get green state)
I've tried to look at the logs but i can't find and obvious reason of what can be going on
I've found entries like these, but I don't know if they are in some way related to the crash:
[2014-05-22 13:55:16,150][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_end] returning default postings format
[2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_end.raw] returning default postings format
[2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_start] returning default postings format
[2014-05-22 13:55:16,151][WARN ][index.codec ] [elastic ASIC nodo 3] [logstash-2014.05.22] no index mapper found for field: [date_start.raw] returning default postings format
For instance right now it was in yellow state, really close to get to the green state and suddenly node 3 autorestarted and now cluster is red with 2000 shard initializing. The log in that node shows this:
[2014-05-22 13:59:48,498][INFO ][monitor.jvm ] [elastic ASIC nodo 3] [gc][young][1181][222] duration [735ms], collections [1]/[1s], total [735ms]/[1.1m], memory [6.5gb]->[6.1gb]/[19.9gb], all_pools {[young] [456mb]->[7.2mb]/[532.5mb]}{[survivor] [66.5mb]->[66.5mb]/[66.5mb]}{[old] [6gb]->[6gb]/[19.3gb]}
[2014-05-22 14:03:44,825][INFO ][node ] [elastic ASIC nodo 3] version[1.1.1], pid[7511], build[f1585f0/2014-04-16T14:27:12Z]
[2014-05-22 14:03:44,826][INFO ][node ] [elastic ASIC nodo 3] initializing ...
[2014-05-22 14:03:44,839][INFO ][plugins ] [elastic ASIC nodo 3] loaded [], sites [paramedic, inquisitor, HQ, bigdesk, head]
[2014-05-22 14:03:51,967][INFO ][node ] [elastic ASIC nodo 3] initialized
[2014-05-22 14:03:51,967][INFO ][node ] [elastic ASIC nodo 3] starting ...
The crash happened exactly at 14:02.
Any Idea what can be going on or how can I trace what's happening?
After rebooting there are also DEBUG errors like this:
[2014-05-22 14:06:16,621][DEBUG][action.search.type ] [elastic ASIC nodo 3] [logstash-2014.05.21][1], node[jgwbxcBoTVa3JIIG5a_FJA], [P], s[STARTED]: Failed to execute [org.elasticsearch.action.search.SearchRequest@42b80f4a] lastShard [true]
org.elasticsearch.transport.SendRequestTransportException: [elastic ASIC nodo 2][inet[/158.42.250.79:9301]][search/phase/query]
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:202)
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:173)
at org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:208)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:203)
at org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.start(TransportSearchTypeAction.java:143)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:59)
at org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:49)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:108)
at org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:43)
at org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:63)
at org.elasticsearch.client.node.NodeClient.execute(NodeClient.java:92)
at org.elasticsearch.client.support.AbstractClient.search(AbstractClient.java:212)
at org.elasticsearch.rest.action.search.RestSearchAction.handleRequest(RestSearchAction.java:98)
at org.elasticsearch.rest.RestController.executeHandler(RestController.java:159)
at org.elasticsearch.rest.RestController.dispatchRequest(RestController.java:142)
at org.elasticsearch.http.HttpServer.internalDispatchRequest(HttpServer.java:121)
at org.elasticsearch.http.HttpServer$Dispatcher.dispatchRequest(HttpServer.java:83)
at org.elasticsearch.http.netty.NettyHttpServerTransport.dispatchRequest(NettyHttpServerTransport.java:291)
at org.elasticsearch.http.netty.HttpRequestHandler.messageReceived(HttpRequestHandler.java:43)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.handler.codec.http.HttpChunkAggregator.messageReceived(HttpChunkAggregator.java:145)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:459)
at org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.callDecode(ReplayingDecoder.java:536)
at org.elasticsearch.common.netty.handler.codec.replay.ReplayingDecoder.messageReceived(ReplayingDecoder.java:435)
at org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at org.elasticsearch.common.netty.OpenChannelsHandler.handleUpstream(OpenChannelsHandler.java:74)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)
Caused by: org.elasticsearch.transport.NodeNotConnectedException: [elastic ASIC nodo 2][inet[/158.42.250.79:9301]] Node not connected
at org.elasticsearch.transport.netty.NettyTransport.nodeChannel(NettyTransport.java:859)
at org.elasticsearch.transport.netty.NettyTransport.sendRequest(NettyTransport.java:540)
at org.elasticsearch.transport.TransportService.sendRequest(TransportService.java:189)
... 50 more
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fa53a41d-064b-4250-8003-31cf845b7216%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
If you reply to this email, your message will be added to the discussion below:
http://elasticsearch-users.115913.n3.nabble.com/Nodes-restarting-automatically-tp4056276.html
To unsubscribe from ElasticSearch Users, click here.
NAML
View this message in context: Re: Nodes restarting automatically
Sent from the ElasticSearch Users mailing list archive at Nabble.com.
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/yBqA-XjzqmM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAE6dBgjyXAM8ELYJ8AKAx6f5pSxri%3DNk1Oq%3Dx%3D5MCp5qYSzuug%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/yBqA-XjzqmM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/AA94DDC8-AC14-47E2-80D5-6B670FF8D9E7%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGJ4z5CqL5ss7MbtO0L481XXkycTdz2qFSH%3DnPvu7P_W_3CiKg%40mail.gmail.com.
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/yBqA-XjzqmM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/F40FD3BA-135B-49B9-B2CF-0E68D58D9B5D%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGJ4z5BzJJ3Hy0CJeJ_zXBSFt7iGRPav%2BSXN8KJ1-ixFNPviUg%40mail.gmail.com.
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/yBqA-XjzqmM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/332CAAEE-2BB9-46F9-A0E3-94D4AD30B21D%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGJ4z5D0Ws_4ZR%2B-%2B%2Bw3iEKKYNHKVTNDr_av56WQCk--b09-jw%40mail.gmail.com.
You received this message because you are subscribed to a topic in the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elasticsearch/yBqA-XjzqmM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ACCBDCBC-7F24-4654-BFE4-A2F0D16D8120%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGJ4z5BLEmNx8TwRX7qHTqaRKneLx37xeSCT_rePcnUoG3pJHQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/etPan.53883ae9.51ead36b.28b%40MacBook-Air-de-David.local.
For more options, visit https://groups.google.com/d/optout.