ElasticSearch resilency problem


(Sicker) #1

I try to reproduce the issue after the connection between node are failed
for 5 minutes and elasticSearch should be rejoin again but it's not.

Is this right scenario that elasticSearch can auto recovery?

This is the log file from my environment

*Node1 Log *

[2012-07-17 15:52:40,988][INFO ][node ] [Uni-Mind]
{0.19.8}[11777]: initializing ...
[2012-07-17 15:52:41,015][INFO ][plugins ] [Uni-Mind]
loaded [analysis-icu], sites [bigdesk, head]
[2012-07-17 15:52:44,292][INFO ][node ] [Uni-Mind]
{0.19.8}[11777]: initialized
[2012-07-17 15:52:44,292][INFO ][node ] [Uni-Mind]
{0.19.8}[11777]: starting ...
[2012-07-17 15:52:44,435][INFO ][transport ] [Uni-Mind]
bound_address {inet[/0.0.0.0:9300]}, publish_address
{inet[/192.168.236.101:9300]}
[2012-07-17 15:52:47,487][INFO ][cluster.service ] [Uni-Mind]
new_master [Uni-Mind][DIVmfDo3TMG6gZpn-troGw][inet[/192.168.236.101:9300]],
reason: zen-disco-join (elected_as_master)
[2012-07-17 15:52:47,568][INFO ][discovery ] [Uni-Mind]
topicsearch/DIVmfDo3TMG6gZpn-troGw
[2012-07-17 15:52:47,605][INFO ][http ] [Uni-Mind]
bound_address {inet[/0.0.0.0:9200]}, publish_address
{inet[/192.168.236.101:9200]}
[2012-07-17 15:52:47,733][INFO ][gateway ] [Uni-Mind]
recovered [0] indices into cluster_state
[2012-07-17 15:52:47,786][INFO ][jmx ] [Uni-Mind]
bound_address {service:jmx:rmi:///jndi/rmi://:9400/jmxrmi}, publish_address
{service:jmx:rmi:///jndi/rmi://192.168.236.101:9400/jmxrmi}
[2012-07-17 15:52:47,786][INFO ][node ] [Uni-Mind]
{0.19.8}[11777]: started
[2012-07-17 15:53:11,223][INFO ][cluster.metadata ] [Uni-Mind]
[twitter] creating index, cause [api], shards [5]/[0], mappings []
[2012-07-17 15:54:59,850][INFO ][cluster.service ] [Uni-Mind]
added {[Doorman][mC4QbYBgSKi-uKcoysCBJg][inet[/192.168.236.102:9300]],},
reason: zen-disco-receive(join from
node[[Doorman][mC4QbYBgSKi-uKcoysCBJg][inet[/192.168.236.102:9300]]])
[2012-07-17 15:57:08,945][INFO ][cluster.service ] [Uni-Mind]
removed {[Doorman][mC4QbYBgSKi-uKcoysCBJg][inet[/192.168.236.102:9300]],},
reason:
zen-disco-node_failed([Doorman][mC4QbYBgSKi-uKcoysCBJg][inet[/192.168.236.102:9300]]),
reason failed to ping, tried [3] times, each with maximum [30s] timeout
[2012-07-17 15:57:08,979][DEBUG][action.admin.cluster.node.info] [Uni-Mind]
failed to execute on node [mC4QbYBgSKi-uKcoysCBJg]
org.elasticsearch.transport.NodeDisconnectedException:
[Doorman][inet[/192.168.236.102:9300]][cluster/nodes/info/n] disconnected
[2012-07-17 15:57:08,982][DEBUG][action.admin.cluster.node.stats]
[Uni-Mind] failed to execute on node [mC4QbYBgSKi-uKcoysCBJg]
org.elasticsearch.transport.NodeDisconnectedException:
[Doorman][inet[/192.168.236.102:9300]][cluster/nodes/stats/n] disconnected
[2012-07-17 15:57:08,986][DEBUG][action.admin.cluster.node.info] [Uni-Mind]
failed to execute on node [mC4QbYBgSKi-uKcoysCBJg]
org.elasticsearch.transport.NodeDisconnectedException:
[Doorman][inet[/192.168.236.102:9300]][cluster/nodes/info/n] disconnected
[2012-07-17 15:57:08,987][DEBUG][action.admin.indices.status] [Uni-Mind]
[twitter][1], node[mC4QbYBgSKi-uKcoysCBJg], [P], s[STARTED]: Failed to
execute
[org.elasticsearch.action.admin.indices.status.IndicesStatusRequest@78dee892]
org.elasticsearch.transport.NodeDisconnectedException:
[Doorman][inet[/192.168.236.102:9300]][indices/status/s] disconnected
[2012-07-17 15:57:08,998][DEBUG][action.admin.cluster.node.stats]
[Uni-Mind] failed to execute on node [mC4QbYBgSKi-uKcoysCBJg]
org.elasticsearch.transport.NodeDisconnectedException:
[Doorman][inet[/192.168.236.102:9300]][cluster/nodes/stats/n] disconnected
[2012-07-17 15:57:08,999][DEBUG][action.admin.indices.status] [Uni-Mind]
[twitter][1], node[mC4QbYBgSKi-uKcoysCBJg], [P], s[STARTED]: Failed to
execute
[org.elasticsearch.action.admin.indices.status.IndicesStatusRequest@45c9d650]
org.elasticsearch.transport.NodeDisconnectedException:
[Doorman][inet[/192.168.236.102:9300]][indices/status/s] disconnected
[2012-07-17 15:57:09,000][DEBUG][action.admin.cluster.node.stats]
[Uni-Mind] failed to execute on node [mC4QbYBgSKi-uKcoysCBJg]
org.elasticsearch.transport.NodeDisconnectedException:
[Doorman][inet[/192.168.236.102:9300]][cluster/nodes/stats/n] disconnected
[2012-07-17 15:57:09,001][DEBUG][action.admin.cluster.node.info] [Uni-Mind]
failed to execute on node [mC4QbYBgSKi-uKcoysCBJg]
org.elasticsearch.transport.NodeDisconnectedException:
[Doorman][inet[/192.168.236.102:9300]][cluster/nodes/info/n] disconnected
[2012-07-17 15:57:09,002][DEBUG][action.admin.indices.status] [Uni-Mind]
[twitter][0], node[mC4QbYBgSKi-uKcoysCBJg], [P], s[STARTED]: Failed to
execute
[org.elasticsearch.action.admin.indices.status.IndicesStatusRequest@2fbb1447]
org.elasticsearch.transport.NodeDisconnectedException:
[Doorman][inet[/192.168.236.102:9300]][indices/status/s] disconnected
[2012-07-17 15:57:09,023][DEBUG][action.admin.indices.status] [Uni-Mind]
[twitter][0], node[mC4QbYBgSKi-uKcoysCBJg], [P], s[STARTED]: Failed to
execute
[org.elasticsearch.action.admin.indices.status.IndicesStatusRequest@78dee892]
org.elasticsearch.transport.NodeDisconnectedException:
[Doorman][inet[/192.168.236.102:9300]][indices/status/s] disconnected
[2012-07-17 15:57:09,032][DEBUG][action.admin.indices.status] [Uni-Mind]
[twitter][0], node[mC4QbYBgSKi-uKcoysCBJg], [P], s[STARTED]: Failed to
execute
[org.elasticsearch.action.admin.indices.status.IndicesStatusRequest@45c9d650]
org.elasticsearch.transport.NodeDisconnectedException:
[Doorman][inet[/192.168.236.102:9300]][indices/status/s] disconnected
[2012-07-17 15:57:09,045][DEBUG][action.admin.indices.status] [Uni-Mind]
[twitter][1], node[mC4QbYBgSKi-uKcoysCBJg], [P], s[STARTED]: Failed to
execute
[org.elasticsearch.action.admin.indices.status.IndicesStatusRequest@2fbb1447]
org.elasticsearch.transport.NodeDisconnectedException:
[Doorman][inet[/192.168.236.102:9300]][indices/status/s] disconnected

*Node 2 Log *

[2012-07-17 15:54:52,932][INFO ][node ] [Doorman]
{0.19.8}[4078]: initializing ...
[2012-07-17 15:54:52,952][INFO ][plugins ] [Doorman]
loaded [analysis-icu], sites [bigdesk, head]
[2012-07-17 15:54:56,243][INFO ][node ] [Doorman]
{0.19.8}[4078]: initialized
[2012-07-17 15:54:56,243][INFO ][node ] [Doorman]
{0.19.8}[4078]: starting ...
[2012-07-17 15:54:56,421][INFO ][transport ] [Doorman]
bound_address {inet[/0.0.0.0:9300]}, publish_address
{inet[/192.168.236.102:9300]}
[2012-07-17 15:54:59,566][INFO ][cluster.service ] [Doorman]
detected_master
[Uni-Mind][DIVmfDo3TMG6gZpn-troGw][inet[/192.168.236.101:9300]], added
{[Uni-Mind][DIVmfDo3TMG6gZpn-troGw][inet[/192.168.236.101:9300]],}, reason:
zen-disco-receive(from master
[[Uni-Mind][DIVmfDo3TMG6gZpn-troGw][inet[/192.168.236.101:9300]]])
[2012-07-17 15:54:59,693][INFO ][discovery ] [Doorman]
topicsearch/mC4QbYBgSKi-uKcoysCBJg
[2012-07-17 15:54:59,699][INFO ][http ] [Doorman]
bound_address {inet[/0.0.0.0:9200]}, publish_address
{inet[/192.168.236.102:9200]}
[2012-07-17 15:54:59,853][INFO ][jmx ] [Doorman]
bound_address {service:jmx:rmi:///jndi/rmi://:9400/jmxrmi}, publish_address
{service:jmx:rmi:///jndi/rmi://192.168.236.102:9400/jmxrmi}
[2012-07-17 15:54:59,853][INFO ][node ] [Doorman]
{0.19.8}[4078]: started
[2012-07-17 15:57:08,663][INFO ][discovery.zen ] [Doorman]
master_left
[[Uni-Mind][DIVmfDo3TMG6gZpn-troGw][inet[/192.168.236.101:9300]]], reason
[failed to ping, tried [3] times, each with maximum [30s] timeout]
[2012-07-17 15:57:08,665][INFO ][cluster.service ] [Doorman]
master {new [Doorman][mC4QbYBgSKi-uKcoysCBJg][inet[/192.168.236.102:9300]],
previous [Uni-Mind][DIVmfDo3TMG6gZpn-troGw][inet[/192.168.236.101:9300]]},
removed {[Uni-Mind][DIVmfDo3TMG6gZpn-troGw][inet[/192.168.236.101:9300]],},
reason: zen-disco-master_failed
([Uni-Mind][DIVmfDo3TMG6gZpn-troGw][inet[/192.168.236.101:9300]])
[2012-07-17 16:14:36,802][INFO ][cluster.metadata ] [Doorman]
[test2] creating index, cause [api], shards [5]/[1], mappings []
[2012-07-17 16:14:55,617][INFO ][cluster.service ] [Doorman] added
{[Asmodeus][zOAExMUjRl2MiqLrn5lGMA][inet[/192.168.236.101:9300]],}, reason:
zen-disco-receive(join from
node[[Asmodeus][zOAExMUjRl2MiqLrn5lGMA][inet[/192.168.236.101:9300]]])
[2012-07-17 16:14:55,771][DEBUG][action.admin.cluster.node.stats] [Doorman]
failed to execute on node [zOAExMUjRl2MiqLrn5lGMA]
org.elasticsearch.transport.RemoteTransportException:
[Asmodeus][inet[/192.168.236.101:9300]][cluster/nodes/stats/n]
Caused by: java.lang.NullPointerException
at
org.elasticsearch.action.support.nodes.NodeOperationResponse.writeTo(NodeOperationResponse.java:66)
at
org.elasticsearch.action.admin.cluster.node.stats.NodeStats.writeTo(NodeStats.java:290)
at
org.elasticsearch.transport.support.TransportStreams.buildResponse(TransportStreams.java:137)
at
org.elasticsearch.transport.netty.NettyTransportChannel.sendResponse(NettyTransportChannel.java:77)
at
org.elasticsearch.transport.netty.NettyTransportChannel.sendResponse(NettyTransportChannel.java:68)
at
org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:276)
at
org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:267)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:400)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
[2012-07-17 16:14:55,781][DEBUG][action.admin.cluster.node.info] [Doorman]
failed to execute on node [zOAExMUjRl2MiqLrn5lGMA]
org.elasticsearch.transport.RemoteTransportException:
[Asmodeus][inet[/192.168.236.101:9300]][cluster/nodes/info/n]
Caused by: java.lang.NullPointerException
at
org.elasticsearch.action.support.nodes.NodeOperationResponse.writeTo(NodeOperationResponse.java:66)
at
org.elasticsearch.action.admin.cluster.node.info.NodeInfo.writeTo(NodeInfo.java:285)
at
org.elasticsearch.transport.support.TransportStreams.buildResponse(TransportStreams.java:137)
at
org.elasticsearch.transport.netty.NettyTransportChannel.sendResponse(NettyTransportChannel.java:77)
at
org.elasticsearch.transport.netty.NettyTransportChannel.sendResponse(NettyTransportChannel.java:68)
at
org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:276)
at
org.elasticsearch.action.support.nodes.TransportNodesOperationAction$NodeTransportHandler.messageReceived(TransportNodesOperationAction.java:267)
at
org.elasticsearch.transport.netty.MessageChannelHandler$RequestHandler.run(MessageChannelHandler.java:400)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)


(system) #2