Memory surges in client app when a node dies


(nicolas.long) #1

Hi all,

I have a situation where if a node in our cluster dies (for whatever
reason) the client app experiences a surge in memory usage, full GCs, and
essentially dies.

I think this is because the client holds on to the connections for a whlie
before realising the node is dead.

Does this sound possible? And does anyone have tips for how to deal with
this. My thinking so far is:

  1. More memory

  2. A circuit-breaker pattern or some such to make sure the app disconnects
    quicker when ES is not responding

But are there ways to configure the ES client to improve the behaviour here?

Thanks,

Nic

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/66c393a3-91d9-4314-a38f-e5267390b9b7%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jason Wee) #2

It should not be possible right? If you configures client app to have two
or more elasticsearch nodes, it should detect if elasticsearch node is down
and not use it during indexing/querying.

What client are you using?

Jason

On Wed, Jan 8, 2014 at 7:48 PM, nicolas.long@guardian.co.uk wrote:

Hi all,

I have a situation where if a node in our cluster dies (for whatever
reason) the client app experiences a surge in memory usage, full GCs, and
essentially dies.

I think this is because the client holds on to the connections for a whlie
before realising the node is dead.

Does this sound possible? And does anyone have tips for how to deal with
this. My thinking so far is:

  1. More memory

  2. A circuit-breaker pattern or some such to make sure the app disconnects
    quicker when ES is not responding

But are there ways to configure the ES client to improve the behaviour
here?

Thanks,

Nic

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/66c393a3-91d9-4314-a38f-e5267390b9b7%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAHO4itxsLZW%3Dw%3DNduajpysjTFiK3hgN%3Def--3frWeOCzoFNsCQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(nicolas.long) #3

We're using the Java transport client.

The problem only happens when the app is dealing with a high number of
requests. I wondered whether it was because the client takes a little bit
of time to detect that the node is unavailable: potentially up to 10
seconds in total (with default settings - 5 seconds to ping the node,
another 5 for the timeout).

And perhaps even after the node has been dropped the existing connections
to the node still need to timeout (not sure what the default is here)?

On Wednesday, 8 January 2014 13:19:29 UTC, Jason Wee wrote:

It should not be possible right? If you configures client app to have two
or more elasticsearch nodes, it should detect if elasticsearch node is down
and not use it during indexing/querying.

What client are you using?

Jason

On Wed, Jan 8, 2014 at 7:48 PM, <nicola...@guardian.co.uk <javascript:>>wrote:

Hi all,

I have a situation where if a node in our cluster dies (for whatever
reason) the client app experiences a surge in memory usage, full GCs, and
essentially dies.

I think this is because the client holds on to the connections for a
whlie before realising the node is dead.

Does this sound possible? And does anyone have tips for how to deal with
this. My thinking so far is:

  1. More memory

  2. A circuit-breaker pattern or some such to make sure the app
    disconnects quicker when ES is not responding

But are there ways to configure the ES client to improve the behaviour
here?

Thanks,

Nic

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/66c393a3-91d9-4314-a38f-e5267390b9b7%40googlegroups.com
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/96062a71-107c-4a4c-80cf-ee676d963218%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #4

Have you tried TransportClient? TransportClient does not share the heap
memory with a cluster node. The setting "client.transport.ping_timeout"
checks if the nodes connected still respond. By default, it is 5 seconds, I
use values up to 30 seconds to survive long GCs without disconnects.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoG3NMZKq71EOYhynih-bJioAw0cUyOV3r6-tt5nB209bQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(nicolas.long) #5

I think you probably replied just after mine!

We are using the transport client yes. And to clarify, ES itself is fine
during these periods. It is the client app that has problems.

On Wednesday, 8 January 2014 13:34:29 UTC, Jörg Prante wrote:

Have you tried TransportClient? TransportClient does not share the heap
memory with a cluster node. The setting "client.transport.ping_timeout"
checks if the nodes connected still respond. By default, it is 5 seconds, I
use values up to 30 seconds to survive long GCs without disconnects.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1ee0c0fe-967d-4b2a-bdce-62173b255911%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #6

ES TransportClient uses a RetryListener which is a bit flaky in case of
exceptions caused by faulty nodes. Some users reported an explosion of port
use and connection retries, and this may also bring the client memory to a
limit. Maybe you have stack traces that show abnormal behavior so it's
worth to raise a github issue?

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHbhiwkCmkKdu_4x7f6S28pmC35detFcXyaDDg9Dkjrkg%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #7