Memory surges in client app when a node dies

nicolas_long · January 8, 2014, 11:48am

Hi all,

I have a situation where if a node in our cluster dies (for whatever
reason) the client app experiences a surge in memory usage, full GCs, and
essentially dies.

I think this is because the client holds on to the connections for a whlie
before realising the node is dead.

Does this sound possible? And does anyone have tips for how to deal with
this. My thinking so far is:

More memory
A circuit-breaker pattern or some such to make sure the app disconnects
quicker when ES is not responding

But are there ways to configure the ES client to improve the behaviour here?

Thanks,

Nic

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/66c393a3-91d9-4314-a38f-e5267390b9b7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jason_Wee · January 8, 2014, 1:19pm

It should not be possible right? If you configures client app to have two
or more elasticsearch nodes, it should detect if elasticsearch node is down
and not use it during indexing/querying.

What client are you using?

Jason

On Wed, Jan 8, 2014 at 7:48 PM, nicolas.long@guardian.co.uk wrote:

Hi all,

I have a situation where if a node in our cluster dies (for whatever
reason) the client app experiences a surge in memory usage, full GCs, and
essentially dies.

I think this is because the client holds on to the connections for a whlie
before realising the node is dead.

Does this sound possible? And does anyone have tips for how to deal with
this. My thinking so far is:

More memory

A circuit-breaker pattern or some such to make sure the app disconnects
quicker when ES is not responding

But are there ways to configure the ES client to improve the behaviour
here?

Thanks,

Nic

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/66c393a3-91d9-4314-a38f-e5267390b9b7%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHO4itxsLZW%3Dw%3DNduajpysjTFiK3hgN%3Def--3frWeOCzoFNsCQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

nicolas_long · January 8, 2014, 1:29pm

We're using the Java transport client.

The problem only happens when the app is dealing with a high number of
requests. I wondered whether it was because the client takes a little bit
of time to detect that the node is unavailable: potentially up to 10
seconds in total (with default settings - 5 seconds to ping the node,
another 5 for the timeout).

And perhaps even after the node has been dropped the existing connections
to the node still need to timeout (not sure what the default is here)?

On Wednesday, 8 January 2014 13:19:29 UTC, Jason Wee wrote:

It should not be possible right? If you configures client app to have two
or more elasticsearch nodes, it should detect if elasticsearch node is down
and not use it during indexing/querying.

What client are you using?

Jason

On Wed, Jan 8, 2014 at 7:48 PM, <nicola...@guardian.co.uk <javascript:>>wrote:

Hi all,

I have a situation where if a node in our cluster dies (for whatever
reason) the client app experiences a surge in memory usage, full GCs, and
essentially dies.

I think this is because the client holds on to the connections for a
whlie before realising the node is dead.

Does this sound possible? And does anyone have tips for how to deal with
this. My thinking so far is:

More memory

A circuit-breaker pattern or some such to make sure the app
disconnects quicker when ES is not responding

But are there ways to configure the ES client to improve the behaviour
here?

Thanks,

Nic

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/66c393a3-91d9-4314-a38f-e5267390b9b7%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/96062a71-107c-4a4c-80cf-ee676d963218%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · January 8, 2014, 1:34pm

Have you tried TransportClient? TransportClient does not share the heap
memory with a cluster node. The setting "client.transport.ping_timeout"
checks if the nodes connected still respond. By default, it is 5 seconds, I
use values up to 30 seconds to survive long GCs without disconnects.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG3NMZKq71EOYhynih-bJioAw0cUyOV3r6-tt5nB209bQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

nicolas_long · January 8, 2014, 1:36pm

I think you probably replied just after mine!

We are using the transport client yes. And to clarify, ES itself is fine
during these periods. It is the client app that has problems.

On Wednesday, 8 January 2014 13:34:29 UTC, Jörg Prante wrote:

Have you tried TransportClient? TransportClient does not share the heap
memory with a cluster node. The setting "client.transport.ping_timeout"
checks if the nodes connected still respond. By default, it is 5 seconds, I
use values up to 30 seconds to survive long GCs without disconnects.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1ee0c0fe-967d-4b2a-bdce-62173b255911%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · January 8, 2014, 1:47pm

ES TransportClient uses a RetryListener which is a bit flaky in case of
exceptions caused by faulty nodes. Some users reported an explosion of port
use and connection retries, and this may also bring the client memory to a
limit. Maybe you have stack traces that show abnormal behavior so it's
worth to raise a github issue?

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHbhiwkCmkKdu_4x7f6S28pmC35detFcXyaDDg9Dkjrkg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
ES java api: how to handle connectivity problems? Elasticsearch	10	1557	July 6, 2017
When one node goes down, memory usage jumps several gigabytes on other nodes Elasticsearch	7	568	July 6, 2017
Elasticsearch client is giving OutOfMemoryError once connection is lost to elaticsearch server Elasticsearch	6	1101	July 6, 2017
Memory issues on ES client node Elasticsearch	11	600	July 6, 2017
ElasticSearch client nodes are continuously getting into GC and not restarting Elasticsearch	3	1148	March 1, 2017

Memory surges in client app when a node dies

Related topics