Resolving a NoNodeAvailableException caused by large GCs

I recently hit a NoNodeAvailableException while performing a search query.
I was able to deduce that the reason behind this was due to large GC events
occurring on the elasticsearch process, causing the (one and only) node to
be unresponsive. The GCs were around 10-30s long.

I was able to mitigate this problem by increasing both the client transport
ping timeout and nodes sampler interval setting values, as described at the
end of
http://www.elasticsearch.org/guide/en/elasticsearch/client/java-api/current/client.html.
I felt it was necessary to also increase the sampler value to equal or
greater than the timeout value, since it doesn't make sense to ping the
availability of a node a second time before the first one has the chance to
time out.

Is this the correct solution given this problem? Is it necessary to
increase both the interval and timeout?

Rico

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b24b9736-13f6-437d-897a-065d089ef604%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

+1 for this question

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/55cda6e6-f089-4c2a-8e6f-4486d8b64cd3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

You should try to reduce stop-the-world GC as much as possible.

Increasing the timeout only makes you not being aware of it. You fight the
symptom, not the cause.

Jörg

On Tue, Apr 29, 2014 at 9:56 PM, Rico Chiu rico.chiu@gmail.com wrote:

I recently hit a NoNodeAvailableException while performing a search query.
I was able to deduce that the reason behind this was due to large GC events
occurring on the elasticsearch process, causing the (one and only) node to
be unresponsive. The GCs were around 10-30s long.

I was able to mitigate this problem by increasing both the client
transport ping timeout and nodes sampler interval setting values, as
described at the end of
Elasticsearch Platform — Find real-time answers at scale | Elastic.
I felt it was necessary to also increase the sampler value to equal or
greater than the timeout value, since it doesn't make sense to ping the
availability of a node a second time before the first one has the chance to
time out.

Is this the correct solution given this problem? Is it necessary to
increase both the interval and timeout?

Rico

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/b24b9736-13f6-437d-897a-065d089ef604%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/b24b9736-13f6-437d-897a-065d089ef604%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHECriLj6Ey8DjZ6N09t4hGnDyN9so2RSpchrHNUsJZQA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.