I have a situation where if a node in our cluster dies (for whatever
reason) the client app experiences a surge in memory usage, full GCs, and
essentially dies.
I think this is because the client holds on to the connections for a whlie
before realising the node is dead.
Does this sound possible? And does anyone have tips for how to deal with
this. My thinking so far is:
More memory
A circuit-breaker pattern or some such to make sure the app disconnects
quicker when ES is not responding
But are there ways to configure the ES client to improve the behaviour here?
It should not be possible right? If you configures client app to have two
or more elasticsearch nodes, it should detect if elasticsearch node is down
and not use it during indexing/querying.
I have a situation where if a node in our cluster dies (for whatever
reason) the client app experiences a surge in memory usage, full GCs, and
essentially dies.
I think this is because the client holds on to the connections for a whlie
before realising the node is dead.
Does this sound possible? And does anyone have tips for how to deal with
this. My thinking so far is:
More memory
A circuit-breaker pattern or some such to make sure the app disconnects
quicker when ES is not responding
But are there ways to configure the ES client to improve the behaviour
here?
The problem only happens when the app is dealing with a high number of
requests. I wondered whether it was because the client takes a little bit
of time to detect that the node is unavailable: potentially up to 10
seconds in total (with default settings - 5 seconds to ping the node,
another 5 for the timeout).
And perhaps even after the node has been dropped the existing connections
to the node still need to timeout (not sure what the default is here)?
On Wednesday, 8 January 2014 13:19:29 UTC, Jason Wee wrote:
It should not be possible right? If you configures client app to have two
or more elasticsearch nodes, it should detect if elasticsearch node is down
and not use it during indexing/querying.
I have a situation where if a node in our cluster dies (for whatever
reason) the client app experiences a surge in memory usage, full GCs, and
essentially dies.
I think this is because the client holds on to the connections for a
whlie before realising the node is dead.
Does this sound possible? And does anyone have tips for how to deal with
this. My thinking so far is:
More memory
A circuit-breaker pattern or some such to make sure the app
disconnects quicker when ES is not responding
But are there ways to configure the ES client to improve the behaviour
here?
Have you tried TransportClient? TransportClient does not share the heap
memory with a cluster node. The setting "client.transport.ping_timeout"
checks if the nodes connected still respond. By default, it is 5 seconds, I
use values up to 30 seconds to survive long GCs without disconnects.
We are using the transport client yes. And to clarify, ES itself is fine
during these periods. It is the client app that has problems.
On Wednesday, 8 January 2014 13:34:29 UTC, Jörg Prante wrote:
Have you tried TransportClient? TransportClient does not share the heap
memory with a cluster node. The setting "client.transport.ping_timeout"
checks if the nodes connected still respond. By default, it is 5 seconds, I
use values up to 30 seconds to survive long GCs without disconnects.
ES TransportClient uses a RetryListener which is a bit flaky in case of
exceptions caused by faulty nodes. Some users reported an explosion of port
use and connection retries, and this may also bring the client memory to a
limit. Maybe you have stack traces that show abnormal behavior so it's
worth to raise a github issue?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.