Configuring multiple nodes

kimchy · April 23, 2010, 3:34pm

Hi,

Great that things are finally working for you!. I am working on a new
discovery module that will replace the jgroups one which I hope will
eliminate the problems you were facing...

cheers,
shay.banon

On Thu, Apr 22, 2010 at 12:19 PM, alexandre gerlic <
alexandre.gerlic@gmail.com> wrote:

Ok Finally : after node discovery crash, I stopped my first node
creating gateway fs files during shutdown.
Then I restarted It.
And it seems working my 2 nodes are still connected after 1 day.

2010/4/21 alexandre gerlic alexandre.gerlic@gmail.com:

I am using tcp and binary release : 0.6.0
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]

thx for your help

2010/4/21 Shay Banon shay.banon@elasticsearch.com:

This is a very strange behavior that I get with jgroups sometimes and
still
have not managed to recreate it. I am working on a workaround for this.
Remind me, are you using udp or tcp with jgroups?
cheers,
shay.banon

On Wed, Apr 21, 2010 at 4:13 PM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Too fast :
in fact it its working a while until I receiv :

host1 :
[13:33:20,054][WARN ][jgroups.FD ] I was suspected by
host2-28908; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:20,274][INFO ][cluster.service ] [Nighthawk] Master
{New [Nighthawk][host1-34919][data][inet[host1.domain1.com/ip1:9300]],
Previous [Anole][host2-28908][data][inet[sd-5175/ip2:9300]]}, Removed
{[Anole][host2-28908][data][inet[host2/ip2:9300]],}
Exception in thread "elasticsearch[Nighthawk][tp]-pool-1-thread-9"
java.lang.NullPointerException
at

org.elasticsearch.transport.SendRequestTransportException.(SendRequestTransportException.java:30)
   at
org.elasticsearch.transport.TransportService$2.run(TransportService.java:152)
   at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:619)
[13:33:50,288][WARN ][jgroups.pbcast.NAKACK ] host1-34919: dropped
message from host2-28908 (not in xmit_table), keys are [host1-34919],
view=[host1-34919|2] [host1-34919]

host2:
[13:33:32,332][WARN ][jgroups.FD ] I was suspected by
host1-34919; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:32,335][WARN ][jgroups.pbcast.GMS ] host2-28908: not
member of view [host1-34919|2] [host1-34919]; discarding it
[13:33:32,601][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...
(received_address[org.elasticsearch.util.transport.DummyTransportAddress@e8d404
)

[13:33:32,674][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@f278dd
)

[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@dd3333
)

[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@4c963c
)

Everything is working fine (_cluster/nodes show 2nodes) until the
first error message then
_cluster/nodes/ show only 1 node
and nodes are not able to reconnects to the other one

2010/4/20 alexandre gerlic alexandre.gerlic@gmail.com:

ok finally I modified by hosts to add my node and it is now working.

2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:

some more infos:
it seems to be the same issue than :

http://github.com/elasticsearch/elasticsearch/issues/labels/bug#issue/40

I downloaded last version and installed it on my bad node.

new error :
[21:48:36,042][WARN ][jgroups.pbcast.NAKACK ] host1-7963: dropped
message from host2-58528 (not in xmit_table), keys are [host1-7963],
view=[host1-7963|0] [host1-7963]

I tried to change sh script to add :
java.net.preferIPv4Stack=false
java.net.preferIPv6Stack=false

but a new error appeared :
[WARN ][jgroups.TCP] no physical address ....

it seems to be a jgroup issue.

2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:

Thx, I updated my config file,
my stack was not complete I updated it :
371357’s gists · GitHub
I will continue to investigate, it is very strange, my networks
seems
good.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

It seems like the host you provide in the configuration happens
can't
be
resolved... . Something with the network configuration?
As a side note, it makes little sense to configure a gateway for
an
index,
and not defining a gateway for the whole cluster. You should
configure the
gateway in the following manner:
gateway:
type: fs
fs.location: /path
This will cause any index created on the node to use the fs
gateway
automatically.
The "top level" gateway is important since it stores all the
cluster
meta
data, such as indices created, mappings, and so on.
cheers,
shay.banon

On Mon, Apr 19, 2010 at 8:58 PM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi, thx more informations below

stack : 371357’s gists · GitHub

ip1 config file :
cluster:
name: clustername
network:
bindHost: ip1
publishHost: ip1
index.engine.robin.refreshInterval: -1
index.gateway.snapshot_interval: -1
index.gateway.fs.location: /path
index.gateway.type: fs
index.number_of_shards : 5
index.number_of_replicas : 1

index :
store:
fs:
memory:
enabled: true
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]

My first node is working on ubuntu 8.04 without problem
seconde one on ubuntu 9.10 throw this exception,

when I call :
http://ip1:9200/_cluster/nodes : only 1 node
http://ip2:9200/_cluster/nodes : 2 nodes

I checked firewall and seems ok.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

Can you post the full stack trace? Are you running in an
embedded
mode?
cheers,
shay.banon

On Mon, Apr 19, 2010 at 2:45 AM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

I tried your fix without success.

Do you have any update on this issue ?

Thx

2010/3/31 Shay Banon shay.banon@elasticsearch.com:

Yes, very strange. I really hope to get around this when I
upgrade to
the
upcoming jgroups version (which is still in alpha stage, so
I
am
waiting).
-shay.banon

On Wed, Mar 31, 2010 at 2:54 AM, Gareth Stokes
gareth@betechnology.com.au
wrote:

I did think that was bizarre, and it was only happening on
one
machine
in
the cluster so i can't exactly replicate the issue. just
thought i
would
document in case anyone else has the same problem.

On 30 March 2010 18:09, Shay Banon
shay.banon@elasticsearch.com
wrote:

I am not sure that the exception you posted on the first
mail
relates
to
the updated configuration since the exception is from the
netty
layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it
should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes
gareth@betechnology.com.au wrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that
worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts:
storage1.example.com[9700],storage2.example.com[9700]

On 30 March 2010 13:08, gareth stokes
gareth@betechnology.com.au
wrote:

Im having a lot of problems getting multiple nodes
talking
to
each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ]
[Alchemy]
Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured
my
setup
but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

Topic		Replies	Views
Unicast instead of Multicast? Elasticsearch	4	781	July 6, 2017
message: [WARN ][cluster.service ] [node1] failed to reconnect to node [node1][I4Wltlc9RSm0jJhumBRtpQ][inet[/10.10.10.1:9300]] Elasticsearch	14	1664	December 31, 2013
Can't join cluster Elasticsearch	5	463	July 6, 2017
Anyone have issues with node communication in a cluster? Elasticsearch	13	381	July 6, 2017
ERROR in bootstrap. ES 0.7.1 Elasticsearch	11	508	July 6, 2017

Configuring multiple nodes

Related topics