I recently hit a NoNodeAvailableException while performing a search query.
I was able to deduce that the reason behind this was due to large GC events
occurring on the elasticsearch process, causing the (one and only) node to
be unresponsive. The GCs were around 10-30s long.
I was able to mitigate this problem by increasing both the client transport
ping timeout and nodes sampler interval setting values, as described at the
end of http://www.elasticsearch.org/guide/en/elasticsearch/client/java-api/current/client.html.
I felt it was necessary to also increase the sampler value to equal or
greater than the timeout value, since it doesn't make sense to ping the
availability of a node a second time before the first one has the chance to
time out.
Is this the correct solution given this problem? Is it necessary to
increase both the interval and timeout?
I recently hit a NoNodeAvailableException while performing a search query.
I was able to deduce that the reason behind this was due to large GC events
occurring on the elasticsearch process, causing the (one and only) node to
be unresponsive. The GCs were around 10-30s long.
I was able to mitigate this problem by increasing both the client
transport ping timeout and nodes sampler interval setting values, as
described at the end of Elasticsearch Platform — Find real-time answers at scale | Elastic.
I felt it was necessary to also increase the sampler value to equal or
greater than the timeout value, since it doesn't make sense to ping the
availability of a node a second time before the first one has the chance to
time out.
Is this the correct solution given this problem? Is it necessary to
increase both the interval and timeout?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.