Hi,
I've begun to test Elasticsearch recently, on a little mockup I've designed.
Currently, I'm running two nodes on two LXC (v0.9) containers. Those
containers are linked using veth to a bridge declared on the host.
When I start the first node, the cluster starts, but when I start the
second node a bit later, it seems to get some information from the other
node but it always ended with the same "no matchind id" error.
Here's what I'm doing :
I start the LXC container of the first node :
root@lada:~# date && lxc-start -n es_node1 -d
mercredi 12 mars 2014, 22:59:39 (UTC+0100)
I logon the node, check the log file :
[2014-03-12 21:59:41,927][INFO ][node ] [Node ES #1]
version[0.90.12], pid[1129], build[26feed7/2014-02-25T15:38:23Z]
[2014-03-12 21:59:41,928][INFO ][node ] [Node ES #1]
initializing ...
[2014-03-12 21:59:41,944][INFO ][plugins ] [Node ES #1]
loaded [], sites []
[2014-03-12 21:59:47,262][INFO ][node ] [Node ES #1]
initialized
[2014-03-12 21:59:47,263][INFO ][node ] [Node ES #1]
starting ...
[2014-03-12 21:59:47,485][INFO ][transport ] [Node ES #1]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/172.16.0.100:9300]}
[2014-03-12 21:59:57,573][INFO ][cluster.service ] [Node ES #1]
new_master [Node ES
#1][LbMQazWXR9uB6Q7R2xLxGQ][inet[/172.16.0.100:9300]]{master=true}, reason:
zen-disco-join (elected_as_master)
[2014-03-12 21:59:57,657][INFO ][discovery ] [Node ES #1]
logstash/LbMQazWXR9uB6Q7R2xLxGQ
[2014-03-12 21:59:57,733][INFO ][http ] [Node ES #1]
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/172.16.0.100:9200]}
[2014-03-12 21:59:57,735][INFO ][node ] [Node ES #1]
started
[2014-03-12 21:59:59,569][INFO ][gateway ] [Node ES #1]
recovered [2] indices into cluster_state
Then I start the second node :
root@lada:/var/lib/lxc/kibana# date && lxc-start -n es_node2 -d
mercredi 12 mars 2014, 23:02:59 (UTC+0100)
Logon on the second node, and open the log :
[2014-03-12 22:03:02,126][INFO ][node ] [Node ES #2]
version[0.90.12], pid[1128], build[26feed7/2014-02-25T15:38:23Z]
[2014-03-12 22:03:02,127][INFO ][node ] [Node ES #2]
initializing ...
[2014-03-12 22:03:02,141][INFO ][plugins ] [Node ES #2]
loaded [], sites []
[2014-03-12 22:03:07,352][INFO ][node ] [Node ES #2]
initialized
[2014-03-12 22:03:07,352][INFO ][node ] [Node ES #2]
starting ...
[2014-03-12 22:03:07,557][INFO ][transport ] [Node ES #2]
bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address
{inet[/172.16.0.101:9300]}
[2014-03-12 22:03:17,637][INFO ][cluster.service ] [Node ES #2]
new_master [Node ES
#2][0nNCsZrFS6y95G1ld-v_rA][inet[/172.16.0.101:9300]]{master=true}, reason:
zen-disco-join (elected_as_master)
[2014-03-12 22:03:17,718][INFO ][discovery ] [Node ES #2]
logstash/0nNCsZrFS6y95G1ld-v_rA
[2014-03-12 22:03:17,783][INFO ][http ] [Node ES #2]
bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address
{inet[/172.16.0.101:9200]}
[2014-03-12 22:03:17,785][INFO ][node ] [Node ES #2]
started
[2014-03-12 22:03:19,550][INFO ][gateway ] [Node ES #2]
recovered [2] indices into cluster_state
[2014-03-12 22:03:52,709][WARN ][discovery.zen.ping.multicast] [Node ES #2]
received ping response ping_response{target [[Node ES
#1][LbMQazWXR9uB6Q7R2xLxGQ][inet[/172.16.0.100:9300]]{master=true}], master
[[Node ES
#1][LbMQazWXR9uB6Q7R2xLxGQ][inet[/172.16.0.100:9300]]{master=true}],
cluster_name[logstash]} with no matching id [1]
At that point, each node considered themselves as master.
Here's my configuration for each node (same for node 1, except the
node.name) :
cluster.name: logstash
node.name: "Node ES #2"
node.master: true
node.data: true
index.number_of_shards: 2
index.number_of_replicas: 1
discovery.zen.ping.timeout: 10s
The bridge on my host is setup to forward immediately every new interfaces
so I don't think the problem is here. Here's the bridge config :
auto br1
iface br1 inet static
address 172.16.0.254
netmask 255.255.255.0
bridge_ports regex veth_.*
bridge_spt off
bridge_maxwait 0
The network configuration on each container is the same (IP aside). Here's
the node #1
root@es_node1:~# cat /etc/network/interfaces
auto lo
iface lo inet loopback
auto eth0
iface eth0 inet static
address 172.16.0.100
netmask 255.255.255.0
gateway 172.16.0.254
Node #2 is identical, except for IP 172.16.0.101
Elasticsearch version :
root@es_node1:~# dpkg -l elasticsearch
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-
pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-===================-==============-==============-============================================
ii elasticsearch 0.90.12 all Open Source,
Distributed, RESTful Search Eng
Host OS version :
root@lada:~# uname -a
Linux lada 3.12-1-amd64 #1 SMP Debian 3.12.6-2 (2013-12-29) x86_64 GNU/Linux
root@lada:~# cat /etc/debian_version
jessie/sid
LXC information :
root@lada:~# dpkg -l "lxc"
Souhait=inconnU/Installé/suppRimé/Purgé/H=à garder
| État=Non/Installé/fichier-Config/dépaqUeté/échec-conFig/H=semi-installé/W=
attend-traitement-déclenchements
|/ Err?=(aucune)/besoin Réinstallation (État,Err: majuscule=mauvais)
||/ Nom Version Architecture Description
+++-======================-================-================-==================================================
ii lxc 0.9.0~alpha3-2+d amd64 Linux
Containers userspace tools
LXC container OS : Debian stable 7.4
If I stop elasticsearch service on Node #2 then restart it, it manages to
join the cluster. However, having the node not joining the cluster at the
server reboot is a big problem for me, and is absolutely not normal.
Do someone have a clue on what's going on ?
Thanks a lot for your help.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/b58fed0b-45ca-4eea-af9f-580b848011ac%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.