Client seems to block/hang when server hangs - v0.18.7

Hi,

We're seeing a problem with elastic Transport Clients hang. Can anyone
see what we're doing wrong?

We're running Elastic v0.18.7 in with a cluster of 3 elastic nodes on
jdk1.7.0_02 on windows 2003 64-bit servers.

We are sometimes getting an operating system problem which hangs one
of the elastic nodes. When this happens, the cluster behaves perfectly
well and rebalances as expected.

Unfortunately we still have a problem in the connected clients. The
calls into the elastic client appear to hang. We've created a gist
here https://gist.github.com/1705716 with 2 thread dumps attached. In
it you can see the problem getting worse - look at
"buckeroo.indexing.ElasticSearchClient.healthiness" calls. There are
41 in the first dump, then 56 in the second thread dump. We have set a
10 second timeout on these calls but this is not being triggered.

The snippet where we create the client is:

    TransportClient transportClient = new TransportClient(
        settingsBuilder()
            .put("cluster.name", clusterName.value)
            .put("client.transport.sniff", true)
            .build());
    for (HostAndPort hostAndPort : hostAndPorts) {
        transportClient.addTransportAddress(new

InetSocketTransportAddress(hostAndPort.host.value, hostAndPort.port));
}

thanks in advance for any help
-- mike

Agreed, we can do better to try and handle this case with transport client (node freezes and seems to not response at all). Opened an issue: Transport Client: Improve remote node freeze handling by adding another timeout layer · Issue #1653 · elastic/elasticsearch · GitHub.

On Monday, January 30, 2012 at 8:38 PM, Mike Hill wrote:

Hi,

We're seeing a problem with elastic Transport Clients hang. Can anyone
see what we're doing wrong?

We're running Elastic v0.18.7 in with a cluster of 3 elastic nodes on
jdk1.7.0_02 on windows 2003 64-bit servers.

We are sometimes getting an operating system problem which hangs one
of the elastic nodes. When this happens, the cluster behaves perfectly
well and rebalances as expected.

Unfortunately we still have a problem in the connected clients. The
calls into the elastic client appear to hang. We've created a gist
here elastic client hang · GitHub with 2 thread dumps attached. In
it you can see the problem getting worse - look at
"buckeroo.indexing.ElasticSearchClient.healthiness" calls. There are
41 in the first dump, then 56 in the second thread dump. We have set a
10 second timeout on these calls but this is not being triggered.

The snippet where we create the client is:

TransportClient transportClient = new TransportClient(
settingsBuilder()
.put("cluster.name (http://cluster.name)", clusterName.value)
.put("client.transport.sniff", true)
.build());
for (HostAndPort hostAndPort : hostAndPorts) {
transportClient.addTransportAddress(new
InetSocketTransportAddress(hostAndPort.host.value, hostAndPort.port));
}

thanks in advance for any help
-- mike

Hi Shay

thanks, that'd be great

-- mike

On Mon, Jan 30, 2012 at 8:08 PM, Shay Banon kimchy@gmail.com wrote:

Agreed, we can do better to try and handle this case with transport
client (node freezes and seems to not response at all). Opened an issue:
Transport Client: Improve remote node freeze handling by adding another timeout layer · Issue #1653 · elastic/elasticsearch · GitHub.