Hi all,
I might have found a bug in the way a non-master node reconnects to a
master.
I use the Java API to create a node using this (subset of) properties:
ImmutableSettings.settingsBuilder()
.put("node.data", false)
.put("node.local", false)
.put("node.master", false)
.put("network.host", "127.0.0.1")
.put("discovery.type", "zen")
.put("discovery.zen.minimum_master_nodes", 1)
.put("discovery.zen.ping.multicast.enabled", false)
.putArray("discovery.zen.ping.unicast.hosts", "127.0.0.1")
.build();
On the same host, there is a single standalone instance, configured
with this elasticsearch.yml:
node.master: true
node.data: true
network.host: 127.0.0.1
discovery.type: zen
discovery.zen.minimum_master_nodes: 1
discovery.zen.ping.multicast.enabled: false
discovery.zen.ping.unicast.hosts: ["127.0.0.1"]
The client is able to connect just fine. However, as soon as I restart
the standalone instance (i.e. the master), resulting in the following
logs on the client:
INFO - zen - [Arturo Falcones] master_left
[[Rhodes, James][U9u4pvSYSMiu2D7pCavK9Q][inet[/127.0.0.1:9300]]
{master=true}], reason [transport disconnected (with verified
connect)]
WARN - zen - [Arturo Falcones] not enough
master nodes after master left (reason = transport disconnected (with
verified connect)), current nodes: {[Arturo Falcones]
[aP03z07qSFaewHW0piuNiA][inet[/127.0.0.1:9301]]{data=false,
local=false, master=false},}
INFO - service - [Arturo Falcones] removed
{[Rhodes, James][U9u4pvSYSMiu2D7pCavK9Q][inet[/127.0.0.1:9300]]
{master=true},}, reason: zen-disco-master_failed ([Rhodes, James]
[U9u4pvSYSMiu2D7pCavK9Q][inet[/127.0.0.1:9300]]{master=true})
INFO - service - [Arturo Falcones] detected_master
[Turac][d9uKtBXERde7zrmNIe9E5A][inet[/127.0.0.1:9300]]{master=true},
added {[Turac][d9uKtBXERde7zrmNIe9E5A][inet[/127.0.0.1:9300]]
{master=true},}, reason: zen-disco-receive(from master [[Turac]
[d9uKtBXERde7zrmNIe9E5A][inet[/127.0.0.1:9300]]{master=true}])
It looks as if the client reconnected just fine. The problem is
though, that every search (didn't try any other operations yet)
results in a IndexMissingException exception:
Caused by: org.elasticsearch.indices.IndexMissingException: [default]
missing
at
org.elasticsearch.cluster.routing.operation.plain.PlainOperationRouting.indexRoutingTable(PlainOperationRouting.java:
230)
at
org.elasticsearch.cluster.routing.operation.plain.PlainOperationRouting.searchShards(PlainOperationRouting.java:
175)
at org.elasticsearch.action.search.type.TransportSearchTypeAction
$BaseAsyncAction.(TransportSearchTypeAction.java:118)
at
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction
$AsyncAction.(TransportSearchQueryThenFetchAction.java:70)
at
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction
$AsyncAction.(TransportSearchQueryThenFetchAction.java:61)
at
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:
58)
at
org.elasticsearch.action.search.type.TransportSearchQueryThenFetchAction.doExecute(TransportSearchQueryThenFetchAction.java:
48)
at
org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:
61)
at
org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:
108)
at
org.elasticsearch.action.search.TransportSearchAction.doExecute(TransportSearchAction.java:
43)
at
org.elasticsearch.action.support.TransportAction.execute(TransportAction.java:
61)
at org.elasticsearch.client.node.NodeClient.execute(NodeClient.java:
83)
at
org.elasticsearch.client.support.AbstractClient.search(AbstractClient.java:
206)
at
org.elasticsearch.action.search.SearchRequestBuilder.doExecute(SearchRequestBuilder.java:
743)
at
org.elasticsearch.action.support.BaseRequestBuilder.execute(BaseRequestBuilder.java:
53)
at
org.elasticsearch.action.support.BaseRequestBuilder.execute(BaseRequestBuilder.java:
47)
Browsing the default index with elasticsearch-head works just fine.
Restarting the client solves the problem too. Hence I'm assuming there
is a problem with the routing metadata being not correctly restored
after reconnect.
Is this indeed a bug or should I explicitly restart the node as soon
as the master went down?
Cheers, Stefan