ES java api: how to handle connectivity problems?

msbreuer · November 28, 2014, 2:09pm

While testing how to handle es-cluster connectivity issues I ran into a
serious problem. The java api node client is connected and then the ES
server is killed. The application hangs in some bulkRequest, but this call
never returns. It also does not return, even if the cluster was started. On
console this exception is shown:

Exception in thread
"elasticsearch[event-collector/12240@amnesia][generic][T#2]"
org.elasticsearch.cluster.block.ClusterBlockException: blocked by:
[SERVICE_UNAVAILABLE/1/state not recovered /
initialized];[SERVICE_UNAVAILABLE/2/no master];
at
org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:138)
at
org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:128)
at
org.elasticsearch.action.bulk.TransportBulkAction.executeBulk(TransportBulkAction.java:197)
at
org.elasticsearch.action.bulk.TransportBulkAction.access$000(TransportBulkAction.java:65)
at
org.elasticsearch.action.bulk.TransportBulkAction$1.onFailure(TransportBulkAction.java:143)
at
org.elasticsearch.action.support.TransportAction$ThreadedActionListener$2.run(TransportAction.java:119)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

I am wondering that this scenario does not work. Any other scenario e.g.
shutdown 1-of-2 nodes is transparently handled. But now the client
application seems hanging for ever.

And ideas?

regards,
markus

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cdb234b8-d90d-41ad-b586-4150c5e80dbc%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Georgi_Ivanov · November 28, 2014, 3:53pm

That's strange.

Can it be a problem in the code ?
Something like looping forever ?

You can set the timeout to bulk request , but there is default timeout of 1
minute.

May be some code will help.

On Friday, November 28, 2014 3:09:37 PM UTC+1, msbr...@gmail.com wrote:

While testing how to handle es-cluster connectivity issues I ran into a
serious problem. The java api node client is connected and then the ES
server is killed. The application hangs in some bulkRequest, but this call
never returns. It also does not return, even if the cluster was started. On
console this exception is shown:

Exception in thread
"elasticsearch[event-collector/12240@amnesia][generic][T#2]"
org.elasticsearch.cluster.block.ClusterBlockException: blocked by:
[SERVICE_UNAVAILABLE/1/state not recovered /
initialized];[SERVICE_UNAVAILABLE/2/no master];
at
org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:138)
at
org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:128)
at
org.elasticsearch.action.bulk.TransportBulkAction.executeBulk(TransportBulkAction.java:197)
at
org.elasticsearch.action.bulk.TransportBulkAction.access$000(TransportBulkAction.java:65)
at
org.elasticsearch.action.bulk.TransportBulkAction$1.onFailure(TransportBulkAction.java:143)
at
org.elasticsearch.action.support.TransportAction$ThreadedActionListener$2.run(TransportAction.java:119)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

I am wondering that this scenario does not work. Any other scenario e.g.
shutdown 1-of-2 nodes is transparently handled. But now the client
application seems hanging for ever.

And ideas?

regards,
markus

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7409f05d-b2cf-458d-b081-081bb10384d9%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

msbreuer · November 28, 2014, 5:34pm

There is nothing special in code. Initially the node-client (not the
transport.client) is created.

Then in indexing-thread calls something like:

irb = client.prepareBulk(...)

or

irb = client.prepareIndex(...)

And finally irb.execute().actionGet() is invoked. With running cluster this
code runs very fine and performant. Also the failover when killing one node
of the cluster works seamless. But when whole cluster goes offline my
application cannot reconnect. As I posted there are logged some internals
(see above), the stacktrace occures about 2-3 times. After that the client
is not funtional anymore.

A fenced the execute().actionGet() with try/catch(Exception+Throwable) but
there is nothing catched. The request is still blocking. I terminated the
process after 20 minutes of waiting.

In any other case, e.g. MappingException, IndexClosedException, ... the
error handling runs fine.

Are there any timeout settings available?

regards,
markus

Am Freitag, 28. November 2014 16:53:23 UTC+1 schrieb Georgi Ivanov:

That's strange.

Can it be a problem in the code ?
Something like looping forever ?

You can set the timeout to bulk request , but there is default timeout of
1 minute.

May be some code will help.

On Friday, November 28, 2014 3:09:37 PM UTC+1, msbr...@gmail.com wrote:

While testing how to handle es-cluster connectivity issues I ran into a
serious problem. The java api node client is connected and then the ES
server is killed. The application hangs in some bulkRequest, but this call
never returns. It also does not return, even if the cluster was started. On
console this exception is shown:

Exception in thread
"elasticsearch[event-collector/12240@amnesia][generic][T#2]"
org.elasticsearch.cluster.block.ClusterBlockException: blocked by:
[SERVICE_UNAVAILABLE/1/state not recovered /
initialized];[SERVICE_UNAVAILABLE/2/no master];
at
org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:138)
at
org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:128)
at
org.elasticsearch.action.bulk.TransportBulkAction.executeBulk(TransportBulkAction.java:197)
at
org.elasticsearch.action.bulk.TransportBulkAction.access$000(TransportBulkAction.java:65)
at
org.elasticsearch.action.bulk.TransportBulkAction$1.onFailure(TransportBulkAction.java:143)
at
org.elasticsearch.action.support.TransportAction$ThreadedActionListener$2.run(TransportAction.java:119)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

I am wondering that this scenario does not work. Any other scenario e.g.
shutdown 1-of-2 nodes is transparently handled. But now the client
application seems hanging for ever.

And ideas?

regards,
markus

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/96039ab8-9486-409f-a689-6aea77b98dd6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · November 28, 2014, 11:08pm

This is expected behavior.

When there are not enough master nodes, and the cluster nodes wait for a
new master, the cluster is blocked and all clients hang or get
SERVICE_UNAVAILABLE ClusterBlockException after a timeout.

From client side, you can play with fault detection response timeout in the
discovery (node client) or TCP timeouts (transport client) in order to
continue.

Jörg

On Fri, Nov 28, 2014 at 3:09 PM, msbreuer@gmail.com wrote:

While testing how to handle es-cluster connectivity issues I ran into a
serious problem. The java api node client is connected and then the ES
server is killed. The application hangs in some bulkRequest, but this call
never returns. It also does not return, even if the cluster was started. On
console this exception is shown:

Exception in thread "elasticsearch[event-collector/12240@amnesia][generic][T#2]"
org.elasticsearch.cluster.block.ClusterBlockException: blocked by:
[SERVICE_UNAVAILABLE/1/state not recovered /
initialized];[SERVICE_UNAVAILABLE/2/no master];
at
org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:138)
at
org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:128)
at
org.elasticsearch.action.bulk.TransportBulkAction.executeBulk(TransportBulkAction.java:197)
at
org.elasticsearch.action.bulk.TransportBulkAction.access$000(TransportBulkAction.java:65)
at
org.elasticsearch.action.bulk.TransportBulkAction$1.onFailure(TransportBulkAction.java:143)
at
org.elasticsearch.action.support.TransportAction$ThreadedActionListener$2.run(TransportAction.java:119)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

I am wondering that this scenario does not work. Any other scenario e.g.
shutdown 1-of-2 nodes is transparently handled. But now the client
application seems hanging for ever.

And ideas?

regards,
markus

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cdb234b8-d90d-41ad-b586-4150c5e80dbc%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/cdb234b8-d90d-41ad-b586-4150c5e80dbc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoEhw-495qPz%3D5DVZxo-%3Dtcgk3trEpWJaZZOX5KjEcx1TQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

msbreuer · November 29, 2014, 9:09am

My last test ran into simalar problems, even if an master was available.
Let me shortly explain the scenario: 2 node-es-cluster, node 1 (isetta) has
less heap configured, node 2 (amnesia) has much more heap. The application
event-collector@amnesia used node-client and sends bulkRequest. The test
ran several hours, but the isetta runs into an heap issue. Here the
event-collector application log:

isetta runs into problem and application hangs. Node 1 amnesia still is
available.
2014-11-29 07:09:28,546 INFO
[elasticsearch[event-collector/27768@amnesia][clusterService#updateTask][T#1]]
org.elasticsearch.cluster.service: [event-collector/27768@amnesia] removed
{[isetta][tKL4oB8mR0Kaj8cqLO4nGw][ISETTA][inet[/139.2.247.65:9300]],},
reason: zen-disco-receive(from master
[[amnesia][gvcFKU8KSjSbKFHB3yNybQ][amnesia][inet[/139.2.246.36:9300]]])
2014-11-29 07:09:28,546 INFO
[elasticsearch[event-collector/27768@amnesia][clusterService#updateTask][T#1]]
org.elasticsearch.cluster.service: [event-collector/27768@amnesia] removed
{[isetta][tKL4oB8mR0Kaj8cqLO4nGw][ISETTA][inet[/139.2.247.65:9300]],},
reason: zen-disco-receive(from master
[[amnesia][gvcFKU8KSjSbKFHB3yNybQ][amnesia][inet[/139.2.246.36:9300]]])
2014-11-29 07:09:53,958 INFO
[elasticsearch[event-collector/27768@amnesia][clusterService#updateTask][T#1]]
org.elasticsearch.cluster.service: [event-collector/27768@amnesia] added
{[isetta][tKL4oB8mR0Kaj8cqLO4nGw][ISETTA][inet[/139.2.247.65:9300]],},
reason: zen-disco-receive(from master
[[amnesia][gvcFKU8KSjSbKFHB3yNybQ][amnesia][inet[/139.2.246.36:9300]]])
2014-11-29 07:09:53,958 INFO
[elasticsearch[event-collector/27768@amnesia][clusterService#updateTask][T#1]]
org.elasticsearch.cluster.service: [event-collector/27768@amnesia] added
{[isetta][tKL4oB8mR0Kaj8cqLO4nGw][ISETTA][inet[/139.2.247.65:9300]],},
reason: zen-disco-receive(from master
[[amnesia][gvcFKU8KSjSbKFHB3yNybQ][amnesia][inet[/139.2.246.36:9300]]])

Much later I terminated node 1 isetta by killing the process:
2014-11-29 09:45:00,590 WARN
[elasticsearch[event-collector/27768@amnesia][transport_client_worker][T#3]{New
I/O worker #5}] org.elasticsearch.transport.netty:

[event-collector/27768@amnesia] exception caught on transport layer [[id:
0x36217255, /139.2.246.36:54716 => /139.2.247.65:9300]], closing connection
java.io.IOException: Eine vorhandene Verbindung wurde vom Remotehost
geschlossen
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
(...)
2014-11-29 09:45:02,509 INFO
[elasticsearch[event-collector/27768@amnesia][clusterService#updateTask][T#1]]
org.elasticsearch.cluster.service: [event-collector/27768@amnesia] removed
{[isetta][tKL4oB8mR0Kaj8cqLO4nGw][ISETTA][inet[/139.2.247.65:9300]],},
reason: zen-disco-receive(from master
[[amnesia][gvcFKU8KSjSbKFHB3yNybQ][amnesia][inet[/139.2.246.36:9300]]])
2014-11-29 09:45:02,571 INFO
[elasticsearch[event-collector/27768@amnesia][clusterService#updateTask][T#1]]
org.elasticsearch.cluster.service: [event-collector/27768@amnesia] removed
{[isetta][tKL4oB8mR0Kaj8cqLO4nGw][ISETTA][inet[/139.2.247.65:9300]],},
reason: zen-disco-receive(from master
[[amnesia][gvcFKU8KSjSbKFHB3yNybQ][amnesia][inet[/139.2.246.36:9300]]])

Now the failing node is removed, but the application sill hangs. The ES
dafault configuration is used (I changed cluster-name only), also there are
no settings to node-client (except cluster-name). Can you give a hint, how
I should configure the application client?

Markus

This is expected behavior.

When there are not enough master nodes, and the cluster nodes wait for a
new master, the cluster is blocked and all clients hang or get
SERVICE_UNAVAILABLE ClusterBlockException after a timeout.

From client side, you can play with fault detection response timeout in
the discovery (node client) or TCP timeouts (transport client) in order to
continue.

Jörg

On Fri, Nov 28, 2014 at 3:09 PM, <msbr...@gmail.com <javascript:>> wrote:

While testing how to handle es-cluster connectivity issues I ran into a
serious problem. The java api node client is connected and then the ES
server is killed. The application hangs in some bulkRequest, but this call
never returns. It also does not return, even if the cluster was started. On
console this exception is shown:

Exception in thread
"elasticsearch[event-collector/12240@amnesia][generic][T#2]"
org.elasticsearch.cluster.block.ClusterBlockException: blocked by:
[SERVICE_UNAVAILABLE/1/state not recovered /
initialized];[SERVICE_UNAVAILABLE/2/no master];
at
org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:138)
at
org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:128)
at
org.elasticsearch.action.bulk.TransportBulkAction.executeBulk(TransportBulkAction.java:197)
at
org.elasticsearch.action.bulk.TransportBulkAction.access$000(TransportBulkAction.java:65)
at
org.elasticsearch.action.bulk.TransportBulkAction$1.onFailure(TransportBulkAction.java:143)
at
org.elasticsearch.action.support.TransportAction$ThreadedActionListener$2.run(TransportAction.java:119)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

I am wondering that this scenario does not work. Any other scenario e.g.
shutdown 1-of-2 nodes is transparently handled. But now the client
application seems hanging for ever.

And ideas?

regards,
markus

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cdb234b8-d90d-41ad-b586-4150c5e80dbc%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/cdb234b8-d90d-41ad-b586-4150c5e80dbc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2385c092-53dd-4a38-8a99-b61493d1f5a1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · November 30, 2014, 7:21pm

2 nodes are not enough to form a distributed system, such a cluster is
prone to split brains, because there is no algorithm that can decide what
node shall continue as master in case of a single node failure.

ES has come precautions built in to suspend execution in this case.

Please use at least 3 nodes and an odd number of nodes with explicit
minimum_master_nodes setting for better split brain prevention.

Jörg

On Sat, Nov 29, 2014 at 10:09 AM, msbreuer@gmail.com wrote:

My last test ran into simalar problems, even if an master was available.
Let me shortly explain the scenario: 2 node-es-cluster, node 1 (isetta) has
less heap configured, node 2 (amnesia) has much more heap. The application
event-collector@amnesia used node-client and sends bulkRequest. The test
ran several hours, but the isetta runs into an heap issue. Here the
event-collector application log:

isetta runs into problem and application hangs. Node 1 amnesia still is
available.
2014-11-29 07:09:28,546 INFO [elasticsearch[event-collector/27768@amnesia][clusterService#updateTask][T#1]]
org.elasticsearch.cluster.service: [event-collector/27768@amnesia]
removed {[isetta][tKL4oB8mR0Kaj8cqLO4nGw][ISETTA][inet[/139.2.247.65:9300]],},
reason: zen-disco-receive(from master
[[amnesia][gvcFKU8KSjSbKFHB3yNybQ][amnesia][inet[/139.2.246.36:9300]]])
2014-11-29 07:09:28,546 INFO [elasticsearch[event-collector/27768@amnesia][clusterService#updateTask][T#1]]
org.elasticsearch.cluster.service: [event-collector/27768@amnesia]
removed {[isetta][tKL4oB8mR0Kaj8cqLO4nGw][ISETTA][inet[/139.2.247.65:9300]],},
reason: zen-disco-receive(from master
[[amnesia][gvcFKU8KSjSbKFHB3yNybQ][amnesia][inet[/139.2.246.36:9300]]])
2014-11-29 07:09:53,958 INFO [elasticsearch[event-collector/27768@amnesia][clusterService#updateTask][T#1]]
org.elasticsearch.cluster.service: [event-collector/27768@amnesia] added
{[isetta][tKL4oB8mR0Kaj8cqLO4nGw][ISETTA][inet[/139.2.247.65:9300]],},
reason: zen-disco-receive(from master
[[amnesia][gvcFKU8KSjSbKFHB3yNybQ][amnesia][inet[/139.2.246.36:9300]]])
2014-11-29 07:09:53,958 INFO [elasticsearch[event-collector/27768@amnesia][clusterService#updateTask][T#1]]
org.elasticsearch.cluster.service: [event-collector/27768@amnesia] added
{[isetta][tKL4oB8mR0Kaj8cqLO4nGw][ISETTA][inet[/139.2.247.65:9300]],},
reason: zen-disco-receive(from master
[[amnesia][gvcFKU8KSjSbKFHB3yNybQ][amnesia][inet[/139.2.246.36:9300]]])

Much later I terminated node 1 isetta by killing the process:
2014-11-29 09:45:00,590 WARN [elasticsearch[event-collector/27768@amnesia][transport_client_worker][T#3]{New
I/O worker #5}] org.elasticsearch.transport.netty:

[event-collector/27768@amnesia] exception caught on transport layer [[id:
0x36217255, /139.2.246.36:54716 => /139.2.247.65:9300]], closing
connection
java.io.IOException: Eine vorhandene Verbindung wurde vom Remotehost
geschlossen
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
(...)
2014-11-29 09:45:02,509 INFO [elasticsearch[event-collector/27768@amnesia][clusterService#updateTask][T#1]]
org.elasticsearch.cluster.service: [event-collector/27768@amnesia]
removed {[isetta][tKL4oB8mR0Kaj8cqLO4nGw][ISETTA][inet[/139.2.247.65:9300]],},
reason: zen-disco-receive(from master
[[amnesia][gvcFKU8KSjSbKFHB3yNybQ][amnesia][inet[/139.2.246.36:9300]]])
2014-11-29 09:45:02,571 INFO [elasticsearch[event-collector/27768@amnesia][clusterService#updateTask][T#1]]
org.elasticsearch.cluster.service: [event-collector/27768@amnesia]
removed {[isetta][tKL4oB8mR0Kaj8cqLO4nGw][ISETTA][inet[/139.2.247.65:9300]],},
reason: zen-disco-receive(from master
[[amnesia][gvcFKU8KSjSbKFHB3yNybQ][amnesia][inet[/139.2.246.36:9300]]])

Now the failing node is removed, but the application sill hangs. The ES
dafault configuration is used (I changed cluster-name only), also there are
no settings to node-client (except cluster-name). Can you give a hint, how
I should configure the application client?

Markus

This is expected behavior.

When there are not enough master nodes, and the cluster nodes wait for a
new master, the cluster is blocked and all clients hang or get
SERVICE_UNAVAILABLE ClusterBlockException after a timeout.

From client side, you can play with fault detection response timeout in
the discovery (node client) or TCP timeouts (transport client) in order to
continue.

Jörg

On Fri, Nov 28, 2014 at 3:09 PM, msbr...@gmail.com wrote:

While testing how to handle es-cluster connectivity issues I ran into a
serious problem. The java api node client is connected and then the ES
server is killed. The application hangs in some bulkRequest, but this call
never returns. It also does not return, even if the cluster was started. On
console this exception is shown:

Exception in thread "elasticsearch[event-collector/12240@amnesia][generic][T#2]"
org.elasticsearch.cluster.block.ClusterBlockException: blocked by:
[SERVICE_UNAVAILABLE/1/state not recovered / initialized];[SERVICE_UNAVAILABLE/2/no
master];
at org.elasticsearch.cluster.block.ClusterBlocks.
globalBlockedException(ClusterBlocks.java:138)
at org.elasticsearch.cluster.block.ClusterBlocks.
globalBlockedRaiseException(ClusterBlocks.java:128)
at org.elasticsearch.action.bulk.TransportBulkAction.executeBulk(
TransportBulkAction.java:197)
at org.elasticsearch.action.bulk.TransportBulkAction.access$
000(TransportBulkAction.java:65)
at org.elasticsearch.action.bulk.TransportBulkAction$1.
onFailure(TransportBulkAction.java:143)
at org.elasticsearch.action.support.TransportAction$
ThreadedActionListener$2.run(TransportAction.java:119)
at java.util.concurrent.ThreadPoolExecutor.runWorker(
ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(
ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

I am wondering that this scenario does not work. Any other scenario e.g.
shutdown 1-of-2 nodes is transparently handled. But now the client
application seems hanging for ever.

And ideas?

regards,
markus

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/cdb234b8-d90d-41ad-b586-4150c5e80dbc%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/cdb234b8-d90d-41ad-b586-4150c5e80dbc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/2385c092-53dd-4a38-8a99-b61493d1f5a1%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/2385c092-53dd-4a38-8a99-b61493d1f5a1%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHDqB_Ptxj74dcGKf%3D-DJ%3DYyuGzUw-%2BJwH0hZc-3Xp06A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

msbreuer · November 30, 2014, 10:07pm

2 nodes are not enough to form a distributed system, such a cluster is
prone to split brains, because there is no algorithm that can decide what
node shall continue as master in case of a single node failure.

Agreed!

ES has come precautions built in to suspend execution in this case.

Please use at least 3 nodes and an odd number of nodes with explicit
minimum_master_nodes setting for better split brain prevention.

Okay! But what about the connectivity issue which started this thread? The
client application still hangs and a client restart seems to solve the
issue. There is no cluster state change when restarting app, isn't it?

Markus

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/430017aa-7c0e-4550-b00e-05040d2e73b2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · November 30, 2014, 10:14pm

The client may hang because I assume the cluster state got unavailable, and
minimum master condition is no longer met. If you re-add the failed node,
the cluster state will be available again, and I think the client will
continue.

Jörg

On Sun, Nov 30, 2014 at 11:07 PM, msbreuer@gmail.com wrote:

2 nodes are not enough to form a distributed system, such a cluster is

prone to split brains, because there is no algorithm that can decide what
node shall continue as master in case of a single node failure.

Agreed!

ES has come precautions built in to suspend execution in this case.

Please use at least 3 nodes and an odd number of nodes with explicit
minimum_master_nodes setting for better split brain prevention.

Okay! But what about the connectivity issue which started this thread? The
client application still hangs and a client restart seems to solve the
issue. There is no cluster state change when restarting app, isn't it?

Markus

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/430017aa-7c0e-4550-b00e-05040d2e73b2%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/430017aa-7c0e-4550-b00e-05040d2e73b2%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoGxVWXv7DUg1ZjO8duG04kgrzyrRnq8rAp9__cG%2Bj1UVw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

msbreuer · December 1, 2014, 5:48pm

No, in my case the client does not continue. The one-node-cluster goes
offline and comes back, but the client still hangs. The master-/data-node
is 1.4.0 and the java client uses 1.3.1.

I tested several scenarios the get the cluster offline: killing it and and
let the server run into OOM Exceptions.

The client may hang because I assume the cluster state got unavailable, and

minimum master condition is no longer met. If you re-add the failed node,
the cluster state will be available again, and I think the client will
continue.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0653ba92-b68e-4b7d-90bb-ae8075ea26e7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jprante · December 1, 2014, 6:33pm

Are you using the node client or the transport client?

I recommend to use timeout settings. By default no timeout is set, and
client waits forever, in both normal and exceptional situation.

With transport client, the fault detector will respond with an exception
when no node is connected.

Jörg

On Mon, Dec 1, 2014 at 6:48 PM, msbreuer@gmail.com wrote:

No, in my case the client does not continue. The one-node-cluster goes
offline and comes back, but the client still hangs. The master-/data-node
is 1.4.0 and the java client uses 1.3.1.

I tested several scenarios the get the cluster offline: killing it and and
let the server run into OOM Exceptions.

The client may hang because I assume the cluster state got unavailable,

and minimum master condition is no longer met. If you re-add the failed
node, the cluster state will be available again, and I think the client
will continue.

Jörg

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0653ba92-b68e-4b7d-90bb-ae8075ea26e7%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0653ba92-b68e-4b7d-90bb-ae8075ea26e7%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoE83t2fsb5WVNoPJ4PH1Ng%2B_bb3vNk8L5DGDssQdsh0ZA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

msbreuer · December 1, 2014, 10:47pm

I am using the node client. Refering to

the timeout is limited and not for ever. What do you think how I should set
the timeout?

Also I am confused: the server node got back but the client does not
recover. Shouldn't it reconnect? Even if the client waits forever it should
reconnect, shouldn't it?

Are you using the node client or the transport client?

I recommend to use timeout settings. By default no timeout is set, and
client waits forever, in both normal and exceptional situation.

With transport client, the fault detector will respond with an exception
when no node is connected.

Jörg

On Mon, Dec 1, 2014 at 6:48 PM, <msbr...@gmail.com <javascript:>> wrote:

No, in my case the client does not continue. The one-node-cluster goes
offline and comes back, but the client still hangs. The master-/data-node
is 1.4.0 and the java client uses 1.3.1.

I tested several scenarios the get the cluster offline: killing it and
and let the server run into OOM Exceptions.

The client may hang because I assume the cluster state got unavailable,

and minimum master condition is no longer met. If you re-add the failed
node, the cluster state will be available again, and I think the client
will continue.

Jörg

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0653ba92-b68e-4b7d-90bb-ae8075ea26e7%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0653ba92-b68e-4b7d-90bb-ae8075ea26e7%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7fcad497-3380-42da-93c7-2c2a016a669d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Node Client with bulk request indefinitely blocked thread when ClusterBlockException is being thrown Elasticsearch	1	345	July 6, 2017
Node Client with bulk request indefinitely blocked thread when ClusterBlockException is being thrown Elasticsearch	9	1004	July 6, 2017
Org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized]; Elasticsearch	3	13724	July 6, 2017
Org.elasticsearch.cluster.block.ClusterBlockException: blocked by: [SERVICE_UNAVAILABLE/1/state not recovered / initialized]; Elasticsearch	1	575	July 6, 2017
App hangs (with es blocking requests) Elasticsearch	5	1025	July 6, 2017

ES java api: how to handle connectivity problems?

Related topics