My last test ran into simalar problems, even if an master was available.
Let me shortly explain the scenario: 2 node-es-cluster, node 1 (isetta) has
less heap configured, node 2 (amnesia) has much more heap. The application
event-collector@amnesia used node-client and sends bulkRequest. The test
ran several hours, but the isetta runs into an heap issue. Here the
event-collector application log:
isetta runs into problem and application hangs. Node 1 amnesia still is
available.
2014-11-29 07:09:28,546 INFO
[elasticsearch[event-collector/27768@amnesia][clusterService#updateTask][T#1]]
org.elasticsearch.cluster.service: [event-collector/27768@amnesia] removed
{[isetta][tKL4oB8mR0Kaj8cqLO4nGw][ISETTA][inet[/139.2.247.65:9300]],},
reason: zen-disco-receive(from master
[[amnesia][gvcFKU8KSjSbKFHB3yNybQ][amnesia][inet[/139.2.246.36:9300]]])
2014-11-29 07:09:28,546 INFO
[elasticsearch[event-collector/27768@amnesia][clusterService#updateTask][T#1]]
org.elasticsearch.cluster.service: [event-collector/27768@amnesia] removed
{[isetta][tKL4oB8mR0Kaj8cqLO4nGw][ISETTA][inet[/139.2.247.65:9300]],},
reason: zen-disco-receive(from master
[[amnesia][gvcFKU8KSjSbKFHB3yNybQ][amnesia][inet[/139.2.246.36:9300]]])
2014-11-29 07:09:53,958 INFO
[elasticsearch[event-collector/27768@amnesia][clusterService#updateTask][T#1]]
org.elasticsearch.cluster.service: [event-collector/27768@amnesia] added
{[isetta][tKL4oB8mR0Kaj8cqLO4nGw][ISETTA][inet[/139.2.247.65:9300]],},
reason: zen-disco-receive(from master
[[amnesia][gvcFKU8KSjSbKFHB3yNybQ][amnesia][inet[/139.2.246.36:9300]]])
2014-11-29 07:09:53,958 INFO
[elasticsearch[event-collector/27768@amnesia][clusterService#updateTask][T#1]]
org.elasticsearch.cluster.service: [event-collector/27768@amnesia] added
{[isetta][tKL4oB8mR0Kaj8cqLO4nGw][ISETTA][inet[/139.2.247.65:9300]],},
reason: zen-disco-receive(from master
[[amnesia][gvcFKU8KSjSbKFHB3yNybQ][amnesia][inet[/139.2.246.36:9300]]])
Much later I terminated node 1 isetta by killing the process:
2014-11-29 09:45:00,590 WARN
[elasticsearch[event-collector/27768@amnesia][transport_client_worker][T#3]{New
I/O worker #5}] org.elasticsearch.transport.netty:
[event-collector/27768@amnesia] exception caught on transport layer [[id:
0x36217255, /139.2.246.36:54716 => /139.2.247.65:9300]], closing connection
java.io.IOException: Eine vorhandene Verbindung wurde vom Remotehost
geschlossen
at sun.nio.ch.SocketDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:43)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
at sun.nio.ch.IOUtil.read(IOUtil.java:192)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:108)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:318)
at
org.elasticsearch.common.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:89)
at
org.elasticsearch.common.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
at
org.elasticsearch.common.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
at
org.elasticsearch.common.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
(...)
2014-11-29 09:45:02,509 INFO
[elasticsearch[event-collector/27768@amnesia][clusterService#updateTask][T#1]]
org.elasticsearch.cluster.service: [event-collector/27768@amnesia] removed
{[isetta][tKL4oB8mR0Kaj8cqLO4nGw][ISETTA][inet[/139.2.247.65:9300]],},
reason: zen-disco-receive(from master
[[amnesia][gvcFKU8KSjSbKFHB3yNybQ][amnesia][inet[/139.2.246.36:9300]]])
2014-11-29 09:45:02,571 INFO
[elasticsearch[event-collector/27768@amnesia][clusterService#updateTask][T#1]]
org.elasticsearch.cluster.service: [event-collector/27768@amnesia] removed
{[isetta][tKL4oB8mR0Kaj8cqLO4nGw][ISETTA][inet[/139.2.247.65:9300]],},
reason: zen-disco-receive(from master
[[amnesia][gvcFKU8KSjSbKFHB3yNybQ][amnesia][inet[/139.2.246.36:9300]]])
Now the failing node is removed, but the application sill hangs. The ES
dafault configuration is used (I changed cluster-name only), also there are
no settings to node-client (except cluster-name). Can you give a hint, how
I should configure the application client?
Markus
This is expected behavior.
When there are not enough master nodes, and the cluster nodes wait for a
new master, the cluster is blocked and all clients hang or get
SERVICE_UNAVAILABLE ClusterBlockException after a timeout.
From client side, you can play with fault detection response timeout in
the discovery (node client) or TCP timeouts (transport client) in order to
continue.
Jörg
On Fri, Nov 28, 2014 at 3:09 PM, <msbr...@gmail.com <javascript:>> wrote:
While testing how to handle es-cluster connectivity issues I ran into a
serious problem. The java api node client is connected and then the ES
server is killed. The application hangs in some bulkRequest, but this call
never returns. It also does not return, even if the cluster was started. On
console this exception is shown:
Exception in thread
"elasticsearch[event-collector/12240@amnesia][generic][T#2]"
org.elasticsearch.cluster.block.ClusterBlockException: blocked by:
[SERVICE_UNAVAILABLE/1/state not recovered /
initialized];[SERVICE_UNAVAILABLE/2/no master];
at
org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedException(ClusterBlocks.java:138)
at
org.elasticsearch.cluster.block.ClusterBlocks.globalBlockedRaiseException(ClusterBlocks.java:128)
at
org.elasticsearch.action.bulk.TransportBulkAction.executeBulk(TransportBulkAction.java:197)
at
org.elasticsearch.action.bulk.TransportBulkAction.access$000(TransportBulkAction.java:65)
at
org.elasticsearch.action.bulk.TransportBulkAction$1.onFailure(TransportBulkAction.java:143)
at
org.elasticsearch.action.support.TransportAction$ThreadedActionListener$2.run(TransportAction.java:119)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I am wondering that this scenario does not work. Any other scenario e.g.
shutdown 1-of-2 nodes is transparently handled. But now the client
application seems hanging for ever.
And ideas?
regards,
markus
--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/cdb234b8-d90d-41ad-b586-4150c5e80dbc%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/cdb234b8-d90d-41ad-b586-4150c5e80dbc%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/2385c092-53dd-4a38-8a99-b61493d1f5a1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.