On 31 Okt., 11:04, Steff st...@designware.dk wrote:
We shutdown using the method described here:Elasticsearch Platform — Find real-time answers at scale | Elastic...
Seems to work fine. But when starting the nodes again shards might get
relocated, even though all nodes are started very quickly after each
other. Isnt relocation of shards supposed to be avoided if you restart
all "gateway.expected_nodes" nodes with a "gateway.recover_after_time"
period of time. We started all nodes within half a minute and we have
"gateway.recover_after_time" set to 5m, but shards where still
relocated. Any explanation or elaboration? Why does this happen?
Any comment on this one. It can be reproduced in a very simple way
(elasticsearch-0.18.2).
My elasticsearch.yml looks like this
cluster.name: tltsteff_es
action.auto_create_index: false
discovery.zen.ping.multicast.enabled: true
node.data: true
gateway.recover_after_nodes: 1
gateway.recover_after_time: 5m
gateway.expected_nodes: 3
discovery.zen.minimum_master_nodes: 1
discovery.zen.ping.timeout: 3s
Step-by-step to reproduce (Im running on Mac OS X Snow Leopard):
- cd <elasticsearch-0.18.2-install>
- Execute the following command 3 times quickly after each other, in
order to start 3 nodes in the same cluster: ./bin/elasticsearch
- Start elasticsearch-head and connect to http://localhost:9200/ -
wait until all nodes have started and joined the cluster (green state)
- Using elasticsearch-head create 3 indices ("index1", "index2" and
"index3") each with 3 shards and 1 replica
- Observe how the shards are distributed among nodes. They are
probably very nicely distributed (each node running one primary shard
and one replica for each index)
- Stop the cluster - using: curl -XPOST 'http://localhost:9200/
_shutdown'
- Execute the following command 3 times quickly after each other, in
order to start 3 nodes in the same cluster: ./bin/elasticsearch
- Observe how the shards are distributed among nodes - the
distribution has probably changed
I do not understand why shards are relocated among the 3 nodes just
becuase the cluster consisting of 3 local nodes is shut down, and 3
new local nodes are started. I think, they are just supposed each to
take the shard-allocation of one of the 3 nodes that ran before the
shut down, and not do any relocation of shards.
It does not seem like it is any gateway stuff causing this moving
shards around - at least I see (running with gateway DEBUG log) the
following line in the log: delaying initial state recovery for [5m]
But I also see the following in the log
[2011-11-02 13:13:42,638][INFO ][cluster.service ]
[Hurricane] new_master [Hurricane][98CZj6QVSy2aiAEr1m_lxg][inet[/
192.168.1.107:9300]], reason: zen-disco-join (elected_as_master)
[2011-11-02 13:13:42,647][INFO ][discovery ]
[Hurricane] tltsteff_es/98CZj6QVSy2aiAEr1m_lxg
[2011-11-02 13:13:42,653][DEBUG][gateway.local ]
[Hurricane] [find_latest_state]: loading metadata from [/Applications/
elasticsearch-0.18.2/data/tltsteff_es/nodes/0/_state/metadata-4]
[2011-11-02 13:13:42,655][DEBUG][gateway.local ]
[Hurricane] [find_latest_state]: loading started shards from [/
Applications/elasticsearch-0.18.2/data/tltsteff_es/nodes/0/_state/
shards-20]
[2011-11-02 13:13:42,656][DEBUG][gateway ]
[Hurricane] delaying initial state recovery for [5m]
[2011-11-02 13:13:42,660][INFO ][http ]
[Hurricane] bound_address {inet[/0.0.0.0:9200]}, publish_address
{inet[/192.168.1.107:9200]}
[2011-11-02 13:13:42,660][INFO ][node ]
[Hurricane] {0.18.2}[1462]: started
[2011-11-02 13:13:44,465][INFO ][cluster.service ]
[Hurricane] added {[Cap 'N Hawk][W32DAJKFRV2zMPqw3vnuxA][inet[/
192.168.1.107:9302]],}, reason: zen-disco-receive(join from node[[Cap
'N Hawk][W32DAJKFRV2zMPqw3vnuxA][inet[/192.168.1.107:9302]]])
[2011-11-02 13:13:44,492][INFO ][cluster.service ] [Cap 'N
Hawk] detected_master [Hurricane][98CZj6QVSy2aiAEr1m_lxg][inet[/
192.168.1.107:9300]], added {[Hurricane][98CZj6QVSy2aiAEr1m_lxg][inet[/
192.168.1.107:9300]],}, reason: zen-disco-receive(from master
[[Hurricane][98CZj6QVSy2aiAEr1m_lxg][inet[/192.168.1.107:9300]]])
[2011-11-02 13:13:44,506][INFO ][discovery ] [Cap 'N
Hawk] tltsteff_es/W32DAJKFRV2zMPqw3vnuxA
[2011-11-02 13:13:44,511][DEBUG][gateway.local ] [Cap 'N
Hawk] [find_latest_state]: loading metadata from [/Applications/
elasticsearch-0.18.2/data/tltsteff_es/nodes/2/_state/metadata-4]
[2011-11-02 13:13:44,513][DEBUG][gateway.local ] [Cap 'N
Hawk] [find_latest_state]: loading started shards from [/Applications/
elasticsearch-0.18.2/data/tltsteff_es/nodes/2/_state/shards-20]
[2011-11-02 13:13:44,531][INFO ][http ] [Cap 'N
Hawk] bound_address {inet[/0.0.0.0:9201]}, publish_address {inet[/
192.168.1.107:9201]}
[2011-11-02 13:13:44,533][INFO ][node ] [Cap 'N
Hawk] {0.18.2}[1490]: started
[2011-11-02 13:13:46,581][INFO ][cluster.service ]
[Hurricane] added {[Shamrock][DRBdotspSPmv6HUHgpmuog][inet[/
192.168.1.107:9301]],}, reason: zen-disco-receive(join from
node[[Shamrock][DRBdotspSPmv6HUHgpmuog][inet[/192.168.1.107:9301]]])
[2011-11-02 13:13:46,590][INFO ][cluster.service ] [Cap 'N
Hawk] added {[Shamrock][DRBdotspSPmv6HUHgpmuog][inet[/
192.168.1.107:9301]],}, reason: zen-disco-receive(from master
[[Hurricane][98CZj6QVSy2aiAEr1m_lxg][inet[/192.168.1.107:9300]]])
[2011-11-02 13:13:46,602][DEBUG][gateway.local ] [Shamrock]
[find_latest_state]: loading metadata from [/Applications/
elasticsearch-0.18.2/data/tltsteff_es/nodes/1/_state/metadata-4]
[2011-11-02 13:13:46,604][DEBUG][gateway.local ] [Shamrock]
[find_latest_state]: loading started shards from [/Applications/
elasticsearch-0.18.2/data/tltsteff_es/nodes/1/_state/shards-20]
[2011-11-02 13:13:46,606][INFO ][cluster.service ] [Shamrock]
detected_master [Hurricane][98CZj6QVSy2aiAEr1m_lxg][inet[/
192.168.1.107:9300]], added {[Hurricane][98CZj6QVSy2aiAEr1m_lxg][inet[/
192.168.1.107:9300]],[Cap 'N Hawk][W32DAJKFRV2zMPqw3vnuxA][inet[/
192.168.1.107:9302]],}, reason: zen-disco-receive(from master
[[Hurricane][98CZj6QVSy2aiAEr1m_lxg][inet[/192.168.1.107:9300]]])
[2011-11-02 13:13:46,609][DEBUG][gateway.local ]
[Hurricane] elected state from [[Hurricane][98CZj6QVSy2aiAEr1m_lxg]
[inet[/192.168.1.107:9300]]]
[2011-11-02 13:13:46,611][INFO ][discovery ] [Shamrock]
tltsteff_es/DRBdotspSPmv6HUHgpmuog
[2011-11-02 13:13:46,620][INFO ][http ] [Shamrock]
bound_address {inet[/0.0.0.0:9202]}, publish_address {inet[/
192.168.1.107:9202]}
[2011-11-02 13:13:46,620][INFO ][node ] [Shamrock]
{0.18.2}[1476]: started
[2011-11-02 13:13:46,627][DEBUG][gateway.local ]
[Hurricane] [index1][0]: allocating [[index1][0], node[null], [P],
s[UNASSIGNED]] to [[Cap 'N Hawk][W32DAJKFRV2zMPqw3vnuxA][inet[/
192.168.1.107:9302]]] on primary allocation
[2011-11-02 13:13:46,629][DEBUG][gateway.local ]
[Hurricane] [index1][1]: allocating [[index1][1], node[null], [P],
s[UNASSIGNED]] to [[Hurricane][98CZj6QVSy2aiAEr1m_lxg][inet[/
192.168.1.107:9300]]] on primary allocation
[2011-11-02 13:13:46,631][DEBUG][gateway.local ]
[Hurricane] [index1][2]: allocating [[index1][2], node[null], [P],
s[UNASSIGNED]] to [[Hurricane][98CZj6QVSy2aiAEr1m_lxg][inet[/
192.168.1.107:9300]]] on primary allocation
[2011-11-02 13:13:46,633][DEBUG][gateway.local ]
[Hurricane] [index2][0]: allocating [[index2][0], node[null], [P],
s[UNASSIGNED]] to [[Cap 'N Hawk][W32DAJKFRV2zMPqw3vnuxA][inet[/
192.168.1.107:9302]]] on primary allocation
[2011-11-02 13:13:46,635][DEBUG][gateway.local ]
[Hurricane] [index2][1]: allocating [[index2][1], node[null], [P],
s[UNASSIGNED]] to [[Hurricane][98CZj6QVSy2aiAEr1m_lxg][inet[/
192.168.1.107:9300]]] on primary allocation
[2011-11-02 13:13:46,637][DEBUG][gateway.local ]
[Hurricane] [index2][2]: allocating [[index2][2], node[null], [P],
s[UNASSIGNED]] to [[Hurricane][98CZj6QVSy2aiAEr1m_lxg][inet[/
192.168.1.107:9300]]] on primary allocation
[2011-11-02 13:13:46,639][DEBUG][gateway.local ]
[Hurricane] [index3][0]: allocating [[index3][0], node[null], [P],
s[UNASSIGNED]] to [[Cap 'N Hawk][W32DAJKFRV2zMPqw3vnuxA][inet[/
192.168.1.107:9302]]] on primary allocation
[2011-11-02 13:13:46,641][DEBUG][gateway.local ]
[Hurricane] [index3][1]: allocating [[index3][1], node[null], [P],
s[UNASSIGNED]] to [[Shamrock][DRBdotspSPmv6HUHgpmuog][inet[/
192.168.1.107:9301]]] on primary allocation
[2011-11-02 13:13:46,643][DEBUG][gateway.local ]
[Hurricane] [index3][2]: allocating [[index3][2], node[null], [P],
s[UNASSIGNED]] to [[Cap 'N Hawk][W32DAJKFRV2zMPqw3vnuxA][inet[/
192.168.1.107:9302]]] on primary allocation
[2011-11-02 13:13:47,028][DEBUG][index.gateway ]
[Hurricane] [index1][1] starting recovery from local ...
In the start of the log it seem like each of the 3 new nodes
(Hurricane, Cap 'N Hawk and Shamrock) take each their node-folder (/
Applications/elasticsearch-0.18.2/data/tltsteff_es/nodes/0, /
Applications/elasticsearch-0.18.2/data/tltsteff_es/nodes/2 and /
Applications/elasticsearch-0.18.2/data/tltsteff_es/nodes/1
respectively) in the data-folder. Thats nice. But afterwards it seems
like Hurricane (the master I guess) findes out that all 9 shards are
UNASSIGNED and starts assigning them to the running nodes (apparently
without taking any consideration to which nodes already have the
shard). I do not understand why this happens. It is not so bad in this
case where the nodes are all running on the same machine and where
there is no data in any of the indices but in general it is stupid
moving shards around after restart when "no nodes are missing".
What am I missing. Some explanation will be greatly appreciated.
Regards, Per Steffensen