I have a five node cluster recently upgraded from ES 1.7 to ES 2.3.0.
Each node has 13 gig of memory dedicated to ES. I have observed that making a large search query will cause a node to exit the cluster. Error logs make reference to zen ping being unable to reach the missing node. Restarting the missing node will allow it to rejoin the cluster, but it subsequently hangs on re-sharding.
Any help in troubleshooting and understanding this issue would be greatly appreciated.
[2016-05-02 08:17:30,447][WARN ][index.translog ] [Node1] [aggblip][3] unexpected error while checking whether the translog needs a flush. rescheduling
java.lang.OutOfMemoryError: Java heap space
[2016-05-02 08:17:30,447][DEBUG][action.search ] [Node1] [120] Failed to execute fetch phase
RemoteTransportException[[Failed to deserialize response of type [org.elasticsearch.search.fetch.FetchSearchResult]]]; nested: TransportSerializationException[Failed to deserialize response of type [org.elasticsearch.search.fetch.FetchSearchResult]]; nested: OutOfMemoryError[Java heap space];
Caused by: TransportSerializationException[Failed to deserialize response of type [org.elasticsearch.search.fetch.FetchSearchResult]]; nested: OutOfMemoryError[Java heap space];
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.