Friendly greetings !
I have a 2 nodes cluster on EC2. (i had a 5 nodes cluster too with the same
kind problem).
Each weeks (sometimes once in 2 week and sometime twice per week) i have a
node crashing with this kind of error :
[2013-10-22 12:42:08,401][DEBUG][action.search.type ] [Aegis]
[kibana-int][4], node[KkhYhIvsQN6QmiAJEJ9HzA], [R], s[STARTED]: Failed to
execute [org.elasticsearch.action.search.SearchRequest@764568a2]
org.elasticsearch.search.SearchParseException: [kibana-int][4]:
from[-1],size[-1]: Parse Failure [Failed to parse source
[{"facets":{"1":{"date_histogram":{"field":"@timestamp","interval":"10m"},"facet_filter":{"
fquery":{"query":{"filtered":{"query":{"query_string":{"query":"startVideo"}},"filter":{"bool":{"must":[{"match_all":{}},{"range":{"@timestamp":{"from":1382272678894,"to":1382445478894}}},{"bool":{"must":[{"match
all":{}}]}}]}}}}}}},"2":{"date_histogram":{"field":"@timestamp","interval":"10m"},"facet_filter":{"fquery":{"query":{"filtered":{"query":{"query_string":{"query":"quartile50"}},"filter":{"bool":{"must":[{"match
all":{}},{"range":{"@timestamp":{"from":1382272678894,"to":1382445478894}}},{"bool":{"must":[{"match_all":{}}]}}]}}}}}}},"3":{"date_histogram":{"field":"@timestamp","interval":"10m"},"facet_filter":{"fquery":{"qu
ery":{"filtered":{"query":{"query_string":{"query":"playCompleted"}},"filter":{"bool":{"must":[{"match_all":{}},{"range":{"@timestamp":{"from":1382272678894,"to":1382445478894}}},{"bool":{"must":[{"match_all":{}}
]}}]}}}}}}},"0":{"date_histogram":{"field":"@timestamp","interval":"10m"},"facet_filter":{"fquery":{"query":{"filtered":{"query":{"query_string":{"query":"advertiserBillable"}},"filter":{"bool":{"must":[{"match_a
ll":{}},{"range":{"@timestamp":{"from":1382272678894,"to":1382445478894}}},{"bool":{"must":[{"match_all":{}}]}}]}}}}}}},"4":{"date_histogram":{"field":"@timestamp","interval":"10m"},"facet_filter":{"fquery":{"que
ry":{"filtered":{"query":{"query_string":{"query":"seekStart"}},"filter":{"bool":{"must":[{"match_all":{}},{"range":{"@timestamp":{"from":1382272678894,"to":1382445478894}}},{"bool":{"must":[{"match_all":{}}]}}]}
}}}}}}},"size":0}]]
at
org.elasticsearch.search.SearchService.parseSource(SearchService.java:561)
at
org.elasticsearch.search.SearchService.createContext(SearchService.java:464)
at
org.elasticsearch.search.SearchService.createContext(SearchService.java:449)
at
org.elasticsearch.search.SearchService.createAndPutContext(SearchService.java:442)
at
org.elasticsearch.search.SearchService.executeQueryPhase(SearchService.java:214)
at
org.elasticsearch.search.action.SearchServiceTransportAction.sendExecuteQuery(SearchServiceTransportAction.java:202)
at
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction$AsyncAction.sendExecuteFirstPhase(TransportSearchQueryThenFetchAction.java:80)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.performFirstPhase(TransportSearchTypeAction.java:216)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction.onFirstPhaseResult(TransportSearchTypeAction.java:293)
at
org.elasticsearch.action.search.type.TransportSearchTypeAction$BaseAsyncAction$3.onFailure(TransportSearchTypeAction.java:224)
at
org.elasticsearch.search.action.SearchServiceTransportAction$4.handleException(SearchServiceTransportAction.java:222)
at
org.elasticsearch.transport.netty.MessageChannelHandler.handleException(MessageChannelHandler.java:180)
at
org.elasticsearch.transport.netty.MessageChannelHandler.handlerResponseError(MessageChannelHandler.java:170)
at
org.elasticsearch.transport.netty.MessageChannelHandler.messageReceived(MessageChannelHandler.java:122)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:296)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
at
org.elasticsearch.common.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
at
org.elasticsearch.common.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
at
org.elasticsearch.common.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:268)
at
org.elasticsearch.common.netty.channel.Channels.fireMessageReceived(Channels.java:255)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:679)
Caused by: org.elasticsearch.search.facet.FacetPhaseExecutionException:
Facet [1]: (key) field [@timestamp] not found
at
org.elasticsearch.search.facet.datehistogram.DateHistogramFacetParser.parse(DateHistogramFacetParser.java:160)
at
org.elasticsearch.search.facet.FacetParseElement.parse(FacetParseElement.java:92)
at
org.elasticsearch.search.SearchService.parseSource(SearchService.java:549)
... 35 more
then followed by out of memory error :
[2013-10-24 01:25:01,348][WARN ][index.merge.scheduler ] [Aegis]
[logstash-2013.10.23][1] failed to merge
java.lang.OutOfMemoryError: Java heap space
[2013-10-24 01:25:01,349][WARN ][index.engine.robin ] [Aegis]
[logstash-2013.10.23][1] failed engine
org.apache.lucene.index.MergePolicy$MergeException:
java.lang.OutOfMemoryError: Java heap space
at
org.elasticsearch.index.merge.scheduler.ConcurrentMergeSchedulerProvider$CustomConcurrentMergeScheduler.handleMergeException(ConcurrentMergeSchedulerProvider.java:99)
at
org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:518)
Caused by: java.lang.OutOfMemoryError: Java heap space
[2013-10-24 01:25:01,350][WARN ][cluster.action.shard ] [Aegis] sending
failed shard for [logstash-2013.10.23][1], node[KkhYhIvsQN6QmiAJEJ9HzA],
[P], s[INITIALIZING], reason [engine failure, message
[MergeException[java.lang.OutOfMemoryError: Java heap space]; nested:
OutOfMemoryError[Java heap space]; ]]
[2013-10-24 01:25:01,351][WARN ][cluster.action.shard ] [Aegis]
received shard failed for [logstash-2013.10.23][1],
node[KkhYhIvsQN6QmiAJEJ9HzA], [P], s[INITIALIZING], reason [engine failure,
message [MergeException[java.lang.OutOfMemoryError: Java heap space];
nested: OutOfMemoryError[Java heap space]; ]]
[2013-10-24 04:17:34,585][WARN ][http.netty ] [Aegis] Caught
exception while handling client http traffic, closing connection [id:
0x1650ccf8, /10.164.36.aaa:27357 => /10.80.141.bbb:9200]
java.lang.OutOfMemoryError: Java heap space
(i removed the last part of the ip)
it never recover, i have to kill -9 java.
I have this config :
aws:
access_key: ***
secret_key: ***
node:
auto_attributes: true
discovery:
type: ec2
discovery.ec2.groups: elasticsearch
i had the s3 gateway too, but it never worked.
here is the commandline show by ps :
/usr/lib/jvm/jre/bin/java -Xms4g -Xmx4g -Xss256k -Djava.awt.headless=true
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
-XX:+HeapDumpOnOutOfMemoryError -Delasticsearch
-Des.path.home=/home/elasticsearch/elasticsearch -cp
:/home/elasticsearch/elasticsearch/lib/elasticsearch-0.90.5.jar:/home/elasticsearch/elasticsearch/lib/:/home/elasticsearch/elasticsearch/lib/sigar/
org.elasticsearch.bootstrap.ElasticSearch
i edited elasticsearch.in.sh to add :
ES_HEAP_SIZE=4g
the node have 8GB of memory.
Since i switched to 2 node and configured heap_size correctly the only node
that crashed is the node that's hosting kibana3.
ElasticHQ show this statistics :
2 Nodes
230 Total Shards
230 Successful Shards
23 Indices
427.207.138 Total Documents
425.1GB Total Size
inserting around 100 doc/s using a Loadbalancer to insert randomly on the 2
nodes. When one of the node crash data inserted are lost on both node until
i restart (so i have a nice hole in my data...)
Any idea ?
Thank you
--
Laurent Laborde
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.