Configuring multiple nodes

Gareth_Stokes · March 30, 2010, 2:08am

Im having a lot of problems getting multiple nodes talking to each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy] Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my setup but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

Gareth_Stokes · March 30, 2010, 5:15am

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in case
anyone else has the same problem:

network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts: storage1.example.com[9700],storage2.example.com[9700]

On 30 March 2010 13:08, gareth stokes gareth@betechnology.com.au wrote:

Im having a lot of problems getting multiple nodes talking to each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy] Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my setup but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

kimchy · March 30, 2010, 7:09am

I am not sure that the exception you posted on the first mail relates to the
updated configuration since the exception is from the netty layer (the
transport) and jgroups fix is for the discovery layer.

In general, you don't have to set the bind_addr, since it should default to
the network.bindHost (assuming both use ipv4/ipv6).

-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes
gareth@betechnology.com.auwrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in case
anyone else has the same problem:

network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts: storage1.example.com[9700],storage2.example.com[9700]

On 30 March 2010 13:08, gareth stokes gareth@betechnology.com.au wrote:

Im having a lot of problems getting multiple nodes talking to each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy] Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my setup but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

Gareth_Stokes · March 30, 2010, 11:54pm

I did think that was bizarre, and it was only happening on one machine in
the cluster so i can't exactly replicate the issue. just thought i would
document in case anyone else has the same problem.

On 30 March 2010 18:09, Shay Banon shay.banon@elasticsearch.com wrote:

I am not sure that the exception you posted on the first mail relates to
the updated configuration since the exception is from the netty layer (the
transport) and jgroups fix is for the discovery layer.

In general, you don't have to set the bind_addr, since it should default to
the network.bindHost (assuming both use ipv4/ipv6).

-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes <gareth@betechnology.com.au

wrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in case
anyone else has the same problem:

network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts: storage1.example.com[9700],storage2.example.com
[9700]

On 30 March 2010 13:08, gareth stokes gareth@betechnology.com.au wrote:

Im having a lot of problems getting multiple nodes talking to each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy] Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my setup but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

kimchy · March 31, 2010, 7:50am

Yes, very strange. I really hope to get around this when I upgrade to the
upcoming jgroups version (which is still in alpha stage, so I am waiting).

-shay.banon

On Wed, Mar 31, 2010 at 2:54 AM, Gareth Stokes
gareth@betechnology.com.auwrote:

I did think that was bizarre, and it was only happening on one machine in
the cluster so i can't exactly replicate the issue. just thought i would
document in case anyone else has the same problem.

On 30 March 2010 18:09, Shay Banon shay.banon@elasticsearch.com wrote:

I am not sure that the exception you posted on the first mail relates to
the updated configuration since the exception is from the netty layer (the
transport) and jgroups fix is for the discovery layer.

In general, you don't have to set the bind_addr, since it should default
to the network.bindHost (assuming both use ipv4/ipv6).

-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes <
gareth@betechnology.com.au> wrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in case
anyone else has the same problem:

network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts: storage1.example.com[9700],storage2.example.com
[9700]

On 30 March 2010 13:08, gareth stokes gareth@betechnology.com.auwrote:

Im having a lot of problems getting multiple nodes talking to each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy] Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my setup but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

alexandre_gerlic · April 18, 2010, 11:45pm

Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

I tried your fix without success.

Do you have any update on this issue ?

Thx

2010/3/31 Shay Banon shay.banon@elasticsearch.com:

Yes, very strange. I really hope to get around this when I upgrade to the
upcoming jgroups version (which is still in alpha stage, so I am waiting).
-shay.banon

On Wed, Mar 31, 2010 at 2:54 AM, Gareth Stokes gareth@betechnology.com.au
wrote:

I did think that was bizarre, and it was only happening on one machine in
the cluster so i can't exactly replicate the issue. just thought i would
document in case anyone else has the same problem.

On 30 March 2010 18:09, Shay Banon shay.banon@elasticsearch.com wrote:

I am not sure that the exception you posted on the first mail relates to
the updated configuration since the exception is from the netty layer (the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it should default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes
gareth@betechnology.com.au wrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts:
storage1.example.com[9700],storage2.example.com[9700]

On 30 March 2010 13:08, gareth stokes gareth@betechnology.com.au
wrote:

Im having a lot of problems getting multiple nodes talking to each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy] Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my setup but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

--
Alexandre Gerlic

kimchy · April 19, 2010, 8:17am

Can you post the full stack trace? Are you running in an embedded mode?

cheers,
shay.banon

On Mon, Apr 19, 2010 at 2:45 AM, alexandre gerlic <
alexandre.gerlic@gmail.com> wrote:

Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

I tried your fix without success.

Do you have any update on this issue ?

Thx

2010/3/31 Shay Banon shay.banon@elasticsearch.com:

Yes, very strange. I really hope to get around this when I upgrade to the
upcoming jgroups version (which is still in alpha stage, so I am
waiting).
-shay.banon

On Wed, Mar 31, 2010 at 2:54 AM, Gareth Stokes <
gareth@betechnology.com.au>
wrote:

I did think that was bizarre, and it was only happening on one machine
in
the cluster so i can't exactly replicate the issue. just thought i would
document in case anyone else has the same problem.

On 30 March 2010 18:09, Shay Banon shay.banon@elasticsearch.com
wrote:

I am not sure that the exception you posted on the first mail relates
to
the updated configuration since the exception is from the netty layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes
gareth@betechnology.com.au wrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts:
storage1.example.com[9700],storage2.example.com[9700]

On 30 March 2010 13:08, gareth stokes gareth@betechnology.com.au
wrote:

Im having a lot of problems getting multiple nodes talking to each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy] Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my setup but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

--
Alexandre Gerlic

alexandre_gerlic · April 19, 2010, 5:58pm

Hi, thx more informations below

stack : 371357’s gists · GitHub

ip1 config file :
cluster:
name: clustername
network:
bindHost: ip1
publishHost: ip1
index.engine.robin.refreshInterval: -1
index.gateway.snapshot_interval: -1
index.gateway.fs.location: /path
index.gateway.type: fs
index.number_of_shards : 5
index.number_of_replicas : 1

index :
store:
fs:
memory:
enabled: true
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]

My first node is working on ubuntu 8.04 without problem
seconde one on ubuntu 9.10 throw this exception,

when I call :
http://ip1:9200/_cluster/nodes : only 1 node
http://ip2:9200/_cluster/nodes : 2 nodes

I checked firewall and seems ok.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

Can you post the full stack trace? Are you running in an embedded mode?
cheers,
shay.banon

On Mon, Apr 19, 2010 at 2:45 AM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

I tried your fix without success.

Do you have any update on this issue ?

Thx

2010/3/31 Shay Banon shay.banon@elasticsearch.com:

Yes, very strange. I really hope to get around this when I upgrade to
the
upcoming jgroups version (which is still in alpha stage, so I am
waiting).
-shay.banon

On Wed, Mar 31, 2010 at 2:54 AM, Gareth Stokes
gareth@betechnology.com.au
wrote:

I did think that was bizarre, and it was only happening on one machine
in
the cluster so i can't exactly replicate the issue. just thought i
would
document in case anyone else has the same problem.

On 30 March 2010 18:09, Shay Banon shay.banon@elasticsearch.com
wrote:

I am not sure that the exception you posted on the first mail relates
to
the updated configuration since the exception is from the netty layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes
gareth@betechnology.com.au wrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts:
storage1.example.com[9700],storage2.example.com[9700]

On 30 March 2010 13:08, gareth stokes gareth@betechnology.com.au
wrote:

Im having a lot of problems getting multiple nodes talking to each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy] Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my setup
but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

--
Alexandre Gerlic

--
Alexandre Gerlic

kimchy · April 19, 2010, 6:09pm

It seems like the host you provide in the configuration happens can't be
resolved... . Something with the network configuration?

As a side note, it makes little sense to configure a gateway for an index,
and not defining a gateway for the whole cluster. You should configure the
gateway in the following manner:

gateway:
type: fs
fs.location: /path

This will cause any index created on the node to use the fs gateway
automatically.

The "top level" gateway is important since it stores all the cluster meta
data, such as indices created, mappings, and so on.

cheers,
shay.banon

On Mon, Apr 19, 2010 at 8:58 PM, alexandre gerlic <
alexandre.gerlic@gmail.com> wrote:

Hi, thx more informations below

stack : 371357’s gists · GitHub

ip1 config file :
cluster:
name: clustername
network:
bindHost: ip1
publishHost: ip1
index.engine.robin.refreshInterval: -1
index.gateway.snapshot_interval: -1
index.gateway.fs.location: /path
index.gateway.type: fs
index.number_of_shards : 5
index.number_of_replicas : 1

index :
store:
fs:
memory:
enabled: true
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]

My first node is working on ubuntu 8.04 without problem
seconde one on ubuntu 9.10 throw this exception,

when I call :
http://ip1:9200/_cluster/nodes : only 1 node
http://ip2:9200/_cluster/nodes : 2 nodes

I checked firewall and seems ok.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

Can you post the full stack trace? Are you running in an embedded mode?
cheers,
shay.banon

On Mon, Apr 19, 2010 at 2:45 AM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

I tried your fix without success.

Do you have any update on this issue ?

Thx

2010/3/31 Shay Banon shay.banon@elasticsearch.com:

Yes, very strange. I really hope to get around this when I upgrade to
the
upcoming jgroups version (which is still in alpha stage, so I am
waiting).
-shay.banon

On Wed, Mar 31, 2010 at 2:54 AM, Gareth Stokes
gareth@betechnology.com.au
wrote:

I did think that was bizarre, and it was only happening on one
machine
in
the cluster so i can't exactly replicate the issue. just thought i
would
document in case anyone else has the same problem.

On 30 March 2010 18:09, Shay Banon shay.banon@elasticsearch.com
wrote:

I am not sure that the exception you posted on the first mail
relates
to
the updated configuration since the exception is from the netty
layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes
gareth@betechnology.com.au wrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts:
storage1.example.com[9700],storage2.example.com[9700]

On 30 March 2010 13:08, gareth stokes gareth@betechnology.com.au
wrote:

Im having a lot of problems getting multiple nodes talking to each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy]
Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my setup
but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

--
Alexandre Gerlic

--
Alexandre Gerlic

alexandre_gerlic · April 19, 2010, 6:26pm

Thx, I updated my config file,
my stack was not complete I updated it :

gist.github.com

https://gist.github.com/371357

gistfile1.groovy

20:21:46,972][INFO ][node                     ] [Mastermind] {ElasticSearch/0.6.0}: Started
[20:22:09,622][INFO ][cluster.service          ] [Mastermind] Added {[Ch'od][hostname-55866][data][inet[hostname:9300]],}
[20:22:09,625][WARN ][transport.netty          ] [Mastermind] Exception caught on netty layer [[id: 0x305e9d7a]]
java.nio.channels.UnresolvedAddressException
	at sun.nio.ch.Net.checkAddress(Net.java:30)
	at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:487)
	at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.connect(NioClientSocketPipelineSink.java:139)
	at org.jboss.netty.channel.socket.nio.NioClientSocketPipelineSink.eventSunk(NioClientSocketPipelineSink.java:103)
	at org.jboss.netty.channel.Channels.connect(Channels.java:784)
	at org.jboss.netty.channel.AbstractChannel.connect(AbstractChannel.java:188)

This file has been truncated. show original

I will continue to investigate, it is very strange, my networks seems good.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

It seems like the host you provide in the configuration happens can't be
resolved... . Something with the network configuration?
As a side note, it makes little sense to configure a gateway for an index,
and not defining a gateway for the whole cluster. You should configure the
gateway in the following manner:
gateway:
type: fs
fs.location: /path
This will cause any index created on the node to use the fs gateway
automatically.
The "top level" gateway is important since it stores all the cluster meta
data, such as indices created, mappings, and so on.
cheers,
shay.banon

On Mon, Apr 19, 2010 at 8:58 PM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi, thx more informations below

stack : 371357’s gists · GitHub

ip1 config file :
cluster:
name: clustername
network:
bindHost: ip1
publishHost: ip1
index.engine.robin.refreshInterval: -1
index.gateway.snapshot_interval: -1
index.gateway.fs.location: /path
index.gateway.type: fs
index.number_of_shards : 5
index.number_of_replicas : 1

index :
store:
fs:
memory:
enabled: true
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]

My first node is working on ubuntu 8.04 without problem
seconde one on ubuntu 9.10 throw this exception,

when I call :
http://ip1:9200/_cluster/nodes : only 1 node
http://ip2:9200/_cluster/nodes : 2 nodes

I checked firewall and seems ok.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

Can you post the full stack trace? Are you running in an embedded mode?
cheers,
shay.banon

On Mon, Apr 19, 2010 at 2:45 AM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

I tried your fix without success.

Do you have any update on this issue ?

Thx

2010/3/31 Shay Banon shay.banon@elasticsearch.com:

Yes, very strange. I really hope to get around this when I upgrade to
the
upcoming jgroups version (which is still in alpha stage, so I am
waiting).
-shay.banon

On Wed, Mar 31, 2010 at 2:54 AM, Gareth Stokes
gareth@betechnology.com.au
wrote:

I did think that was bizarre, and it was only happening on one
machine
in
the cluster so i can't exactly replicate the issue. just thought i
would
document in case anyone else has the same problem.

On 30 March 2010 18:09, Shay Banon shay.banon@elasticsearch.com
wrote:

I am not sure that the exception you posted on the first mail
relates
to
the updated configuration since the exception is from the netty
layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes
gareth@betechnology.com.au wrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts:
storage1.example.com[9700],storage2.example.com[9700]

On 30 March 2010 13:08, gareth stokes gareth@betechnology.com.au
wrote:

Im having a lot of problems getting multiple nodes talking to
each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy]
Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my setup
but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

alexandre_gerlic · April 19, 2010, 8:36pm

some more infos:
it seems to be the same issue than :
http://github.com/elasticsearch/elasticsearch/issues/labels/bug#issue/40

I downloaded last version and installed it on my bad node.

new error :
[21:48:36,042][WARN ][jgroups.pbcast.NAKACK ] host1-7963: dropped
message from host2-58528 (not in xmit_table), keys are [host1-7963],
view=[host1-7963|0] [host1-7963]

I tried to change sh script to add :
java.net.preferIPv4Stack=false
java.net.preferIPv6Stack=false

but a new error appeared :
[WARN ][jgroups.TCP] no physical address ....

it seems to be a jgroup issue.

2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:

Thx, I updated my config file,
my stack was not complete I updated it :
371357’s gists · GitHub
I will continue to investigate, it is very strange, my networks seems good.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

It seems like the host you provide in the configuration happens can't be
resolved... . Something with the network configuration?
As a side note, it makes little sense to configure a gateway for an index,
and not defining a gateway for the whole cluster. You should configure the
gateway in the following manner:
gateway:
type: fs
fs.location: /path
This will cause any index created on the node to use the fs gateway
automatically.
The "top level" gateway is important since it stores all the cluster meta
data, such as indices created, mappings, and so on.
cheers,
shay.banon

On Mon, Apr 19, 2010 at 8:58 PM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi, thx more informations below

stack : 371357’s gists · GitHub

ip1 config file :
cluster:
name: clustername
network:
bindHost: ip1
publishHost: ip1
index.engine.robin.refreshInterval: -1
index.gateway.snapshot_interval: -1
index.gateway.fs.location: /path
index.gateway.type: fs
index.number_of_shards : 5
index.number_of_replicas : 1

index :
store:
fs:
memory:
enabled: true
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]

My first node is working on ubuntu 8.04 without problem
seconde one on ubuntu 9.10 throw this exception,

when I call :
http://ip1:9200/_cluster/nodes : only 1 node
http://ip2:9200/_cluster/nodes : 2 nodes

I checked firewall and seems ok.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

Can you post the full stack trace? Are you running in an embedded mode?
cheers,
shay.banon

On Mon, Apr 19, 2010 at 2:45 AM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

I tried your fix without success.

Do you have any update on this issue ?

Thx

2010/3/31 Shay Banon shay.banon@elasticsearch.com:

Yes, very strange. I really hope to get around this when I upgrade to
the
upcoming jgroups version (which is still in alpha stage, so I am
waiting).
-shay.banon

On Wed, Mar 31, 2010 at 2:54 AM, Gareth Stokes
gareth@betechnology.com.au
wrote:

I did think that was bizarre, and it was only happening on one
machine
in
the cluster so i can't exactly replicate the issue. just thought i
would
document in case anyone else has the same problem.

On 30 March 2010 18:09, Shay Banon shay.banon@elasticsearch.com
wrote:

I am not sure that the exception you posted on the first mail
relates
to
the updated configuration since the exception is from the netty
layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes
gareth@betechnology.com.au wrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts:
storage1.example.com[9700],storage2.example.com[9700]

On 30 March 2010 13:08, gareth stokes gareth@betechnology.com.au
wrote:

Im having a lot of problems getting multiple nodes talking to
each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy]
Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my setup
but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

alexandre_gerlic · April 19, 2010, 11:26pm

ok finally I modified by hosts to add my node and it is now working.

2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:

some more infos:
it seems to be the same issue than :
http://github.com/elasticsearch/elasticsearch/issues/labels/bug#issue/40

I downloaded last version and installed it on my bad node.

new error :
[21:48:36,042][WARN ][jgroups.pbcast.NAKACK ] host1-7963: dropped
message from host2-58528 (not in xmit_table), keys are [host1-7963],
view=[host1-7963|0] [host1-7963]

I tried to change sh script to add :
java.net.preferIPv4Stack=false
java.net.preferIPv6Stack=false

but a new error appeared :
[WARN ][jgroups.TCP] no physical address ....

it seems to be a jgroup issue.

2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:

Thx, I updated my config file,
my stack was not complete I updated it :
371357’s gists · GitHub
I will continue to investigate, it is very strange, my networks seems good.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

It seems like the host you provide in the configuration happens can't be
resolved... . Something with the network configuration?
As a side note, it makes little sense to configure a gateway for an index,
and not defining a gateway for the whole cluster. You should configure the
gateway in the following manner:
gateway:
type: fs
fs.location: /path
This will cause any index created on the node to use the fs gateway
automatically.
The "top level" gateway is important since it stores all the cluster meta
data, such as indices created, mappings, and so on.
cheers,
shay.banon

On Mon, Apr 19, 2010 at 8:58 PM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi, thx more informations below

stack : 371357’s gists · GitHub

ip1 config file :
cluster:
name: clustername
network:
bindHost: ip1
publishHost: ip1
index.engine.robin.refreshInterval: -1
index.gateway.snapshot_interval: -1
index.gateway.fs.location: /path
index.gateway.type: fs
index.number_of_shards : 5
index.number_of_replicas : 1

index :
store:
fs:
memory:
enabled: true
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]

My first node is working on ubuntu 8.04 without problem
seconde one on ubuntu 9.10 throw this exception,

when I call :
http://ip1:9200/_cluster/nodes : only 1 node
http://ip2:9200/_cluster/nodes : 2 nodes

I checked firewall and seems ok.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

Can you post the full stack trace? Are you running in an embedded mode?
cheers,
shay.banon

On Mon, Apr 19, 2010 at 2:45 AM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

I tried your fix without success.

Do you have any update on this issue ?

Thx

2010/3/31 Shay Banon shay.banon@elasticsearch.com:

Yes, very strange. I really hope to get around this when I upgrade to
the
upcoming jgroups version (which is still in alpha stage, so I am
waiting).
-shay.banon

On Wed, Mar 31, 2010 at 2:54 AM, Gareth Stokes
gareth@betechnology.com.au
wrote:

I did think that was bizarre, and it was only happening on one
machine
in
the cluster so i can't exactly replicate the issue. just thought i
would
document in case anyone else has the same problem.

On 30 March 2010 18:09, Shay Banon shay.banon@elasticsearch.com
wrote:

I am not sure that the exception you posted on the first mail
relates
to
the updated configuration since the exception is from the netty
layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes
gareth@betechnology.com.au wrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts:
storage1.example.com[9700],storage2.example.com[9700]

On 30 March 2010 13:08, gareth stokes gareth@betechnology.com.au
wrote:

Im having a lot of problems getting multiple nodes talking to
each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy]
Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my setup
but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

alexandre_gerlic · April 21, 2010, 1:13pm

Too fast :
in fact it its working a while until I receiv :

host1 :
[13:33:20,054][WARN ][jgroups.FD ] I was suspected by
host2-28908; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:20,274][INFO ][cluster.service ] [Nighthawk] Master
{New [Nighthawk][host1-34919][data][inet[host1.domain1.com/ip1:9300]],
Previous [Anole][host2-28908][data][inet[sd-5175/ip2:9300]]}, Removed
{[Anole][host2-28908][data][inet[host2/ip2:9300]],}
Exception in thread "elasticsearch[Nighthawk][tp]-pool-1-thread-9"
java.lang.NullPointerException
at org.elasticsearch.transport.SendRequestTransportException.(SendRequestTransportException.java:30)
at org.elasticsearch.transport.TransportService$2.run(TransportService.java:152)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
[13:33:50,288][WARN ][jgroups.pbcast.NAKACK ] host1-34919: dropped
message from host2-28908 (not in xmit_table), keys are [host1-34919],
view=[host1-34919|2] [host1-34919]

host2:
[13:33:32,332][WARN ][jgroups.FD ] I was suspected by
host1-34919; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:32,335][WARN ][jgroups.pbcast.GMS ] host2-28908: not
member of view [host1-34919|2] [host1-34919]; discarding it
[13:33:32,601][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...
(received_address[org.elasticsearch.util.transport.DummyTransportAddress@e8d404)
[13:33:32,674][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...
(received_address[org.elasticsearch.util.transport.DummyTransportAddress@f278dd)
[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...
(received_address[org.elasticsearch.util.transport.DummyTransportAddress@dd3333)
[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...
(received_address[org.elasticsearch.util.transport.DummyTransportAddress@4c963c)

Everything is working fine (_cluster/nodes show 2nodes) until the
first error message then
_cluster/nodes/ show only 1 node
and nodes are not able to reconnects to the other one

2010/4/20 alexandre gerlic alexandre.gerlic@gmail.com:

ok finally I modified by hosts to add my node and it is now working.

2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:

some more infos:
it seems to be the same issue than :
http://github.com/elasticsearch/elasticsearch/issues/labels/bug#issue/40

I downloaded last version and installed it on my bad node.

new error :
[21:48:36,042][WARN ][jgroups.pbcast.NAKACK ] host1-7963: dropped
message from host2-58528 (not in xmit_table), keys are [host1-7963],
view=[host1-7963|0] [host1-7963]

I tried to change sh script to add :
java.net.preferIPv4Stack=false
java.net.preferIPv6Stack=false

but a new error appeared :
[WARN ][jgroups.TCP] no physical address ....

it seems to be a jgroup issue.

2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:

Thx, I updated my config file,
my stack was not complete I updated it :
371357’s gists · GitHub
I will continue to investigate, it is very strange, my networks seems good.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

It seems like the host you provide in the configuration happens can't be
resolved... . Something with the network configuration?
As a side note, it makes little sense to configure a gateway for an index,
and not defining a gateway for the whole cluster. You should configure the
gateway in the following manner:
gateway:
type: fs
fs.location: /path
This will cause any index created on the node to use the fs gateway
automatically.
The "top level" gateway is important since it stores all the cluster meta
data, such as indices created, mappings, and so on.
cheers,
shay.banon

On Mon, Apr 19, 2010 at 8:58 PM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi, thx more informations below

stack : 371357’s gists · GitHub

ip1 config file :
cluster:
name: clustername
network:
bindHost: ip1
publishHost: ip1
index.engine.robin.refreshInterval: -1
index.gateway.snapshot_interval: -1
index.gateway.fs.location: /path
index.gateway.type: fs
index.number_of_shards : 5
index.number_of_replicas : 1

index :
store:
fs:
memory:
enabled: true
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]

My first node is working on ubuntu 8.04 without problem
seconde one on ubuntu 9.10 throw this exception,

when I call :
http://ip1:9200/_cluster/nodes : only 1 node
http://ip2:9200/_cluster/nodes : 2 nodes

I checked firewall and seems ok.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

Can you post the full stack trace? Are you running in an embedded mode?
cheers,
shay.banon

On Mon, Apr 19, 2010 at 2:45 AM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

I tried your fix without success.

Do you have any update on this issue ?

Thx

2010/3/31 Shay Banon shay.banon@elasticsearch.com:

Yes, very strange. I really hope to get around this when I upgrade to
the
upcoming jgroups version (which is still in alpha stage, so I am
waiting).
-shay.banon

On Wed, Mar 31, 2010 at 2:54 AM, Gareth Stokes
gareth@betechnology.com.au
wrote:

I did think that was bizarre, and it was only happening on one
machine
in
the cluster so i can't exactly replicate the issue. just thought i
would
document in case anyone else has the same problem.

On 30 March 2010 18:09, Shay Banon shay.banon@elasticsearch.com
wrote:

I am not sure that the exception you posted on the first mail
relates
to
the updated configuration since the exception is from the netty
layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes
gareth@betechnology.com.au wrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts:
storage1.example.com[9700],storage2.example.com[9700]

On 30 March 2010 13:08, gareth stokes gareth@betechnology.com.au
wrote:

Im having a lot of problems getting multiple nodes talking to
each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy]
Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my setup
but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

kimchy · April 21, 2010, 5:41pm

This is a very strange behavior that I get with jgroups sometimes and still
have not managed to recreate it. I am working on a workaround for this.
Remind me, are you using udp or tcp with jgroups?

cheers,
shay.banon

On Wed, Apr 21, 2010 at 4:13 PM, alexandre gerlic <
alexandre.gerlic@gmail.com> wrote:

Too fast :
in fact it its working a while until I receiv :

host1 :
[13:33:20,054][WARN ][jgroups.FD ] I was suspected by
host2-28908; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:20,274][INFO ][cluster.service ] [Nighthawk] Master
{New [Nighthawk][host1-34919][data][inet[host1.domain1.com/ip1:9300]],
Previous [Anole][host2-28908][data][inet[sd-5175/ip2:9300]]}, Removed
{[Anole][host2-28908][data][inet[host2/ip2:9300]],}
Exception in thread "elasticsearch[Nighthawk][tp]-pool-1-thread-9"
java.lang.NullPointerException
at
org.elasticsearch.transport.SendRequestTransportException.(SendRequestTransportException.java:30)
at
org.elasticsearch.transport.TransportService$2.run(TransportService.java:152)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
[13:33:50,288][WARN ][jgroups.pbcast.NAKACK ] host1-34919: dropped
message from host2-28908 (not in xmit_table), keys are [host1-34919],
view=[host1-34919|2] [host1-34919]

host2:
[13:33:32,332][WARN ][jgroups.FD ] I was suspected by
host1-34919; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:32,335][WARN ][jgroups.pbcast.GMS ] host2-28908: not
member of view [host1-34919|2] [host1-34919]; discarding it
[13:33:32,601][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@e8d404
)
[13:33:32,674][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@f278dd
)
[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@dd3333
)
[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@4c963c
)

Everything is working fine (_cluster/nodes show 2nodes) until the
first error message then
_cluster/nodes/ show only 1 node
and nodes are not able to reconnects to the other one

2010/4/20 alexandre gerlic alexandre.gerlic@gmail.com:

ok finally I modified by hosts to add my node and it is now working.

2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:

some more infos:
it seems to be the same issue than :

http://github.com/elasticsearch/elasticsearch/issues/labels/bug#issue/40

I downloaded last version and installed it on my bad node.

new error :
[21:48:36,042][WARN ][jgroups.pbcast.NAKACK ] host1-7963: dropped
message from host2-58528 (not in xmit_table), keys are [host1-7963],
view=[host1-7963|0] [host1-7963]

I tried to change sh script to add :
java.net.preferIPv4Stack=false
java.net.preferIPv6Stack=false

but a new error appeared :
[WARN ][jgroups.TCP] no physical address ....

it seems to be a jgroup issue.

2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:

Thx, I updated my config file,
my stack was not complete I updated it :
371357’s gists · GitHub
I will continue to investigate, it is very strange, my networks seems
good.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

It seems like the host you provide in the configuration happens can't
be
resolved... . Something with the network configuration?
As a side note, it makes little sense to configure a gateway for an
index,
and not defining a gateway for the whole cluster. You should configure
the
gateway in the following manner:
gateway:
type: fs
fs.location: /path
This will cause any index created on the node to use the fs gateway
automatically.
The "top level" gateway is important since it stores all the cluster
meta
data, such as indices created, mappings, and so on.
cheers,
shay.banon

On Mon, Apr 19, 2010 at 8:58 PM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi, thx more informations below

stack : 371357’s gists · GitHub

ip1 config file :
cluster:
name: clustername
network:
bindHost: ip1
publishHost: ip1
index.engine.robin.refreshInterval: -1
index.gateway.snapshot_interval: -1
index.gateway.fs.location: /path
index.gateway.type: fs
index.number_of_shards : 5
index.number_of_replicas : 1

index :
store:
fs:
memory:
enabled: true
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]

My first node is working on ubuntu 8.04 without problem
seconde one on ubuntu 9.10 throw this exception,

when I call :
http://ip1:9200/_cluster/nodes : only 1 node
http://ip2:9200/_cluster/nodes : 2 nodes

I checked firewall and seems ok.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

Can you post the full stack trace? Are you running in an embedded
mode?
cheers,
shay.banon

On Mon, Apr 19, 2010 at 2:45 AM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

I tried your fix without success.

Do you have any update on this issue ?

Thx

2010/3/31 Shay Banon shay.banon@elasticsearch.com:

Yes, very strange. I really hope to get around this when I
upgrade to
the
upcoming jgroups version (which is still in alpha stage, so I am
waiting).
-shay.banon

On Wed, Mar 31, 2010 at 2:54 AM, Gareth Stokes
gareth@betechnology.com.au
wrote:

I did think that was bizarre, and it was only happening on one
machine
in
the cluster so i can't exactly replicate the issue. just
thought i
would
document in case anyone else has the same problem.

On 30 March 2010 18:09, Shay Banon <
shay.banon@elasticsearch.com>
wrote:

I am not sure that the exception you posted on the first mail
relates
to
the updated configuration since the exception is from the
netty
layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it
should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes
gareth@betechnology.com.au wrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that
worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts:
storage1.example.com[9700],storage2.example.com[9700]

On 30 March 2010 13:08, gareth stokes <
gareth@betechnology.com.au>
wrote:

Im having a lot of problems getting multiple nodes talking
to
each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy]
Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my
setup
but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

alexandre_gerlic · April 21, 2010, 6:21pm

I am using tcp and binary release : 0.6.0
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]

thx for your help

2010/4/21 Shay Banon shay.banon@elasticsearch.com:

This is a very strange behavior that I get with jgroups sometimes and still
have not managed to recreate it. I am working on a workaround for this.
Remind me, are you using udp or tcp with jgroups?
cheers,
shay.banon

On Wed, Apr 21, 2010 at 4:13 PM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Too fast :
in fact it its working a while until I receiv :

host1 :
[13:33:20,054][WARN ][jgroups.FD ] I was suspected by
host2-28908; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:20,274][INFO ][cluster.service ] [Nighthawk] Master
{New [Nighthawk][host1-34919][data][inet[host1.domain1.com/ip1:9300]],
Previous [Anole][host2-28908][data][inet[sd-5175/ip2:9300]]}, Removed
{[Anole][host2-28908][data][inet[host2/ip2:9300]],}
Exception in thread "elasticsearch[Nighthawk][tp]-pool-1-thread-9"
java.lang.NullPointerException
at
org.elasticsearch.transport.SendRequestTransportException.(SendRequestTransportException.java:30)
at
org.elasticsearch.transport.TransportService$2.run(TransportService.java:152)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
[13:33:50,288][WARN ][jgroups.pbcast.NAKACK ] host1-34919: dropped
message from host2-28908 (not in xmit_table), keys are [host1-34919],
view=[host1-34919|2] [host1-34919]

host2:
[13:33:32,332][WARN ][jgroups.FD ] I was suspected by
host1-34919; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:32,335][WARN ][jgroups.pbcast.GMS ] host2-28908: not
member of view [host1-34919|2] [host1-34919]; discarding it
[13:33:32,601][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@e8d404)
[13:33:32,674][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@f278dd)
[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@dd3333)
[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@4c963c)

Everything is working fine (_cluster/nodes show 2nodes) until the
first error message then
_cluster/nodes/ show only 1 node
and nodes are not able to reconnects to the other one

2010/4/20 alexandre gerlic alexandre.gerlic@gmail.com:

ok finally I modified by hosts to add my node and it is now working.

2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:

some more infos:
it seems to be the same issue than :

http://github.com/elasticsearch/elasticsearch/issues/labels/bug#issue/40

I downloaded last version and installed it on my bad node.

new error :
[21:48:36,042][WARN ][jgroups.pbcast.NAKACK ] host1-7963: dropped
message from host2-58528 (not in xmit_table), keys are [host1-7963],
view=[host1-7963|0] [host1-7963]

I tried to change sh script to add :
java.net.preferIPv4Stack=false
java.net.preferIPv6Stack=false

but a new error appeared :
[WARN ][jgroups.TCP] no physical address ....

it seems to be a jgroup issue.

2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:

Thx, I updated my config file,
my stack was not complete I updated it :
371357’s gists · GitHub
I will continue to investigate, it is very strange, my networks seems
good.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

It seems like the host you provide in the configuration happens can't
be
resolved... . Something with the network configuration?
As a side note, it makes little sense to configure a gateway for an
index,
and not defining a gateway for the whole cluster. You should
configure the
gateway in the following manner:
gateway:
type: fs
fs.location: /path
This will cause any index created on the node to use the fs gateway
automatically.
The "top level" gateway is important since it stores all the cluster
meta
data, such as indices created, mappings, and so on.
cheers,
shay.banon

On Mon, Apr 19, 2010 at 8:58 PM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi, thx more informations below

stack : 371357’s gists · GitHub

ip1 config file :
cluster:
name: clustername
network:
bindHost: ip1
publishHost: ip1
index.engine.robin.refreshInterval: -1
index.gateway.snapshot_interval: -1
index.gateway.fs.location: /path
index.gateway.type: fs
index.number_of_shards : 5
index.number_of_replicas : 1

index :
store:
fs:
memory:
enabled: true
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]

My first node is working on ubuntu 8.04 without problem
seconde one on ubuntu 9.10 throw this exception,

when I call :
http://ip1:9200/_cluster/nodes : only 1 node
http://ip2:9200/_cluster/nodes : 2 nodes

I checked firewall and seems ok.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

Can you post the full stack trace? Are you running in an embedded
mode?
cheers,
shay.banon

On Mon, Apr 19, 2010 at 2:45 AM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

I tried your fix without success.

Do you have any update on this issue ?

Thx

2010/3/31 Shay Banon shay.banon@elasticsearch.com:

Yes, very strange. I really hope to get around this when I
upgrade to
the
upcoming jgroups version (which is still in alpha stage, so I
am
waiting).
-shay.banon

On Wed, Mar 31, 2010 at 2:54 AM, Gareth Stokes
gareth@betechnology.com.au
wrote:

I did think that was bizarre, and it was only happening on one
machine
in
the cluster so i can't exactly replicate the issue. just
thought i
would
document in case anyone else has the same problem.

On 30 March 2010 18:09, Shay Banon
shay.banon@elasticsearch.com
wrote:

I am not sure that the exception you posted on the first mail
relates
to
the updated configuration since the exception is from the
netty
layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it
should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes
gareth@betechnology.com.au wrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that
worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts:
storage1.example.com[9700],storage2.example.com[9700]

On 30 March 2010 13:08, gareth stokes
gareth@betechnology.com.au
wrote:

Im having a lot of problems getting multiple nodes talking
to
each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy]
Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my
setup
but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

alexandre_gerlic · April 22, 2010, 9:19am

Ok Finally : after node discovery crash, I stopped my first node
creating gateway fs files during shutdown.
Then I restarted It.
And it seems working my 2 nodes are still connected after 1 day.

2010/4/21 alexandre gerlic alexandre.gerlic@gmail.com:

I am using tcp and binary release : 0.6.0
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]

thx for your help

2010/4/21 Shay Banon shay.banon@elasticsearch.com:

This is a very strange behavior that I get with jgroups sometimes and still
have not managed to recreate it. I am working on a workaround for this.
Remind me, are you using udp or tcp with jgroups?
cheers,
shay.banon

On Wed, Apr 21, 2010 at 4:13 PM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Too fast :
in fact it its working a while until I receiv :

host1 :
[13:33:20,054][WARN ][jgroups.FD ] I was suspected by
host2-28908; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:20,274][INFO ][cluster.service ] [Nighthawk] Master
{New [Nighthawk][host1-34919][data][inet[host1.domain1.com/ip1:9300]],
Previous [Anole][host2-28908][data][inet[sd-5175/ip2:9300]]}, Removed
{[Anole][host2-28908][data][inet[host2/ip2:9300]],}
Exception in thread "elasticsearch[Nighthawk][tp]-pool-1-thread-9"
java.lang.NullPointerException
at
org.elasticsearch.transport.SendRequestTransportException.(SendRequestTransportException.java:30)
at
org.elasticsearch.transport.TransportService$2.run(TransportService.java:152)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
[13:33:50,288][WARN ][jgroups.pbcast.NAKACK ] host1-34919: dropped
message from host2-28908 (not in xmit_table), keys are [host1-34919],
view=[host1-34919|2] [host1-34919]

host2:
[13:33:32,332][WARN ][jgroups.FD ] I was suspected by
host1-34919; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:32,335][WARN ][jgroups.pbcast.GMS ] host2-28908: not
member of view [host1-34919|2] [host1-34919]; discarding it
[13:33:32,601][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@e8d404)
[13:33:32,674][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@f278dd)
[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@dd3333)
[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@4c963c)

Everything is working fine (_cluster/nodes show 2nodes) until the
first error message then
_cluster/nodes/ show only 1 node
and nodes are not able to reconnects to the other one

2010/4/20 alexandre gerlic alexandre.gerlic@gmail.com:

ok finally I modified by hosts to add my node and it is now working.

2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:

some more infos:
it seems to be the same issue than :

http://github.com/elasticsearch/elasticsearch/issues/labels/bug#issue/40

I downloaded last version and installed it on my bad node.

new error :
[21:48:36,042][WARN ][jgroups.pbcast.NAKACK ] host1-7963: dropped
message from host2-58528 (not in xmit_table), keys are [host1-7963],
view=[host1-7963|0] [host1-7963]

I tried to change sh script to add :
java.net.preferIPv4Stack=false
java.net.preferIPv6Stack=false

but a new error appeared :
[WARN ][jgroups.TCP] no physical address ....

it seems to be a jgroup issue.

2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:

Thx, I updated my config file,
my stack was not complete I updated it :
371357’s gists · GitHub
I will continue to investigate, it is very strange, my networks seems
good.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

It seems like the host you provide in the configuration happens can't
be
resolved... . Something with the network configuration?
As a side note, it makes little sense to configure a gateway for an
index,
and not defining a gateway for the whole cluster. You should
configure the
gateway in the following manner:
gateway:
type: fs
fs.location: /path
This will cause any index created on the node to use the fs gateway
automatically.
The "top level" gateway is important since it stores all the cluster
meta
data, such as indices created, mappings, and so on.
cheers,
shay.banon

On Mon, Apr 19, 2010 at 8:58 PM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi, thx more informations below

stack : 371357’s gists · GitHub

ip1 config file :
cluster:
name: clustername
network:
bindHost: ip1
publishHost: ip1
index.engine.robin.refreshInterval: -1
index.gateway.snapshot_interval: -1
index.gateway.fs.location: /path
index.gateway.type: fs
index.number_of_shards : 5
index.number_of_replicas : 1

index :
store:
fs:
memory:
enabled: true
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]

My first node is working on ubuntu 8.04 without problem
seconde one on ubuntu 9.10 throw this exception,

when I call :
http://ip1:9200/_cluster/nodes : only 1 node
http://ip2:9200/_cluster/nodes : 2 nodes

I checked firewall and seems ok.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

Can you post the full stack trace? Are you running in an embedded
mode?
cheers,
shay.banon

On Mon, Apr 19, 2010 at 2:45 AM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

I tried your fix without success.

Do you have any update on this issue ?

Thx

2010/3/31 Shay Banon shay.banon@elasticsearch.com:

Yes, very strange. I really hope to get around this when I
upgrade to
the
upcoming jgroups version (which is still in alpha stage, so I
am
waiting).
-shay.banon

On Wed, Mar 31, 2010 at 2:54 AM, Gareth Stokes
gareth@betechnology.com.au
wrote:

I did think that was bizarre, and it was only happening on one
machine
in
the cluster so i can't exactly replicate the issue. just
thought i
would
document in case anyone else has the same problem.

On 30 March 2010 18:09, Shay Banon
shay.banon@elasticsearch.com
wrote:

I am not sure that the exception you posted on the first mail
relates
to
the updated configuration since the exception is from the
netty
layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it
should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes
gareth@betechnology.com.au wrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that
worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts:
storage1.example.com[9700],storage2.example.com[9700]

On 30 March 2010 13:08, gareth stokes
gareth@betechnology.com.au
wrote:

Im having a lot of problems getting multiple nodes talking
to
each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy]
Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my
setup
but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

kimchy · April 23, 2010, 3:34pm

Hi,

Great that things are finally working for you!. I am working on a new
discovery module that will replace the jgroups one which I hope will
eliminate the problems you were facing...

cheers,
shay.banon

On Thu, Apr 22, 2010 at 12:19 PM, alexandre gerlic <
alexandre.gerlic@gmail.com> wrote:

Ok Finally : after node discovery crash, I stopped my first node
creating gateway fs files during shutdown.
Then I restarted It.
And it seems working my 2 nodes are still connected after 1 day.

2010/4/21 alexandre gerlic alexandre.gerlic@gmail.com:

I am using tcp and binary release : 0.6.0
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]

thx for your help

2010/4/21 Shay Banon shay.banon@elasticsearch.com:

This is a very strange behavior that I get with jgroups sometimes and
still
have not managed to recreate it. I am working on a workaround for this.
Remind me, are you using udp or tcp with jgroups?
cheers,
shay.banon

On Wed, Apr 21, 2010 at 4:13 PM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Too fast :
in fact it its working a while until I receiv :

host1 :
[13:33:20,054][WARN ][jgroups.FD ] I was suspected by
host2-28908; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:20,274][INFO ][cluster.service ] [Nighthawk] Master
{New [Nighthawk][host1-34919][data][inet[host1.domain1.com/ip1:9300]],
Previous [Anole][host2-28908][data][inet[sd-5175/ip2:9300]]}, Removed
{[Anole][host2-28908][data][inet[host2/ip2:9300]],}
Exception in thread "elasticsearch[Nighthawk][tp]-pool-1-thread-9"
java.lang.NullPointerException
at

org.elasticsearch.transport.SendRequestTransportException.(SendRequestTransportException.java:30)
   at
org.elasticsearch.transport.TransportService$2.run(TransportService.java:152)
   at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:619)
[13:33:50,288][WARN ][jgroups.pbcast.NAKACK ] host1-34919: dropped
message from host2-28908 (not in xmit_table), keys are [host1-34919],
view=[host1-34919|2] [host1-34919]

host2:
[13:33:32,332][WARN ][jgroups.FD ] I was suspected by
host1-34919; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:32,335][WARN ][jgroups.pbcast.GMS ] host2-28908: not
member of view [host1-34919|2] [host1-34919]; discarding it
[13:33:32,601][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...
(received_address[org.elasticsearch.util.transport.DummyTransportAddress@e8d404
)

[13:33:32,674][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@f278dd
)

[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@dd3333
)

[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@4c963c
)

Everything is working fine (_cluster/nodes show 2nodes) until the
first error message then
_cluster/nodes/ show only 1 node
and nodes are not able to reconnects to the other one

2010/4/20 alexandre gerlic alexandre.gerlic@gmail.com:

ok finally I modified by hosts to add my node and it is now working.

2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:

some more infos:
it seems to be the same issue than :

http://github.com/elasticsearch/elasticsearch/issues/labels/bug#issue/40

I downloaded last version and installed it on my bad node.

new error :
[21:48:36,042][WARN ][jgroups.pbcast.NAKACK ] host1-7963: dropped
message from host2-58528 (not in xmit_table), keys are [host1-7963],
view=[host1-7963|0] [host1-7963]

I tried to change sh script to add :
java.net.preferIPv4Stack=false
java.net.preferIPv6Stack=false

but a new error appeared :
[WARN ][jgroups.TCP] no physical address ....

it seems to be a jgroup issue.

2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:

Thx, I updated my config file,
my stack was not complete I updated it :
371357’s gists · GitHub
I will continue to investigate, it is very strange, my networks
seems
good.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

It seems like the host you provide in the configuration happens
can't
be
resolved... . Something with the network configuration?
As a side note, it makes little sense to configure a gateway for
an
index,
and not defining a gateway for the whole cluster. You should
configure the
gateway in the following manner:
gateway:
type: fs
fs.location: /path
This will cause any index created on the node to use the fs
gateway
automatically.
The "top level" gateway is important since it stores all the
cluster
meta
data, such as indices created, mappings, and so on.
cheers,
shay.banon

On Mon, Apr 19, 2010 at 8:58 PM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi, thx more informations below

stack : 371357’s gists · GitHub

ip1 config file :
cluster:
name: clustername
network:
bindHost: ip1
publishHost: ip1
index.engine.robin.refreshInterval: -1
index.gateway.snapshot_interval: -1
index.gateway.fs.location: /path
index.gateway.type: fs
index.number_of_shards : 5
index.number_of_replicas : 1

index :
store:
fs:
memory:
enabled: true
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]

My first node is working on ubuntu 8.04 without problem
seconde one on ubuntu 9.10 throw this exception,

when I call :
http://ip1:9200/_cluster/nodes : only 1 node
http://ip2:9200/_cluster/nodes : 2 nodes

I checked firewall and seems ok.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

Can you post the full stack trace? Are you running in an
embedded
mode?
cheers,
shay.banon

On Mon, Apr 19, 2010 at 2:45 AM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

I tried your fix without success.

Do you have any update on this issue ?

Thx

2010/3/31 Shay Banon shay.banon@elasticsearch.com:

Yes, very strange. I really hope to get around this when I
upgrade to
the
upcoming jgroups version (which is still in alpha stage, so
I
am
waiting).
-shay.banon

On Wed, Mar 31, 2010 at 2:54 AM, Gareth Stokes
gareth@betechnology.com.au
wrote:

I did think that was bizarre, and it was only happening on
one
machine
in
the cluster so i can't exactly replicate the issue. just
thought i
would
document in case anyone else has the same problem.

On 30 March 2010 18:09, Shay Banon
shay.banon@elasticsearch.com
wrote:

I am not sure that the exception you posted on the first
mail
relates
to
the updated configuration since the exception is from the
netty
layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it
should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes
gareth@betechnology.com.au wrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that
worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts:
storage1.example.com[9700],storage2.example.com[9700]

On 30 March 2010 13:08, gareth stokes
gareth@betechnology.com.au
wrote:

Im having a lot of problems getting multiple nodes
talking
to
each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ]
[Alchemy]
Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured
my
setup
but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

Sergio_Bossa · April 23, 2010, 4:26pm

On Fri, Apr 23, 2010 at 5:34 PM, Shay Banon
shay.banon@elasticsearch.com wrote:

Great that things are finally working for you!. I am working on a new
discovery module that will replace the jgroups one which I hope will
eliminate the problems you were facing...

Sounds great: what are you using for the new discovery module? Or is
it completely written from scratch?

--
Sergio Bossa
http://www.linkedin.com/in/sergiob

kimchy · April 23, 2010, 9:20pm

Completely from scratch, utilizing the built in components in elasticsearch
(like the transport). Also, trying to build one that has pluggable support
for the "cloud" (more on that later...).

On Fri, Apr 23, 2010 at 7:26 PM, Sergio Bossa sergio.bossa@gmail.comwrote:

On Fri, Apr 23, 2010 at 5:34 PM, Shay Banon
shay.banon@elasticsearch.com wrote:

Great that things are finally working for you!. I am working on a new
discovery module that will replace the jgroups one which I hope will
eliminate the problems you were facing...

Sounds great: what are you using for the new discovery module? Or is
it completely written from scratch?

--
Sergio Bossa
http://www.linkedin.com/in/sergiob

Topic		Replies	Views
Unicast instead of Multicast? Elasticsearch	3	794	March 1, 2010
message: [WARN ][cluster.service ] [node1] failed to reconnect to node [node1][I4Wltlc9RSm0jJhumBRtpQ][inet[/10.10.10.1:9300]] Elasticsearch	13	1694	December 31, 2013
Can't join cluster Elasticsearch	4	483	October 17, 2012
Anyone have issues with node communication in a cluster? Elasticsearch	12	425	July 26, 2010
ERROR in bootstrap. ES 0.7.1 Elasticsearch	10	531	May 25, 2010

Configuring multiple nodes

Related topics