Configuring multiple nodes


(Gareth Stokes) #1

Im having a lot of problems getting multiple nodes talking to each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy] Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my setup but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]


(Gareth Stokes) #2

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in case
anyone else has the same problem:

network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts: storage1.example.com[9700],storage2.example.com[9700]

On 30 March 2010 13:08, gareth stokes gareth@betechnology.com.au wrote:

Im having a lot of problems getting multiple nodes talking to each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy] Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my setup but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]


(Shay Banon) #3

I am not sure that the exception you posted on the first mail relates to the
updated configuration since the exception is from the netty layer (the
transport) and jgroups fix is for the discovery layer.

In general, you don't have to set the bind_addr, since it should default to
the network.bindHost (assuming both use ipv4/ipv6).

-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes
gareth@betechnology.com.auwrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in case
anyone else has the same problem:

network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts: storage1.example.com[9700],storage2.example.com[9700]

On 30 March 2010 13:08, gareth stokes gareth@betechnology.com.au wrote:

Im having a lot of problems getting multiple nodes talking to each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy] Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my setup but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]


(Gareth Stokes) #4

I did think that was bizarre, and it was only happening on one machine in
the cluster so i can't exactly replicate the issue. just thought i would
document in case anyone else has the same problem.

On 30 March 2010 18:09, Shay Banon shay.banon@elasticsearch.com wrote:

I am not sure that the exception you posted on the first mail relates to
the updated configuration since the exception is from the netty layer (the
transport) and jgroups fix is for the discovery layer.

In general, you don't have to set the bind_addr, since it should default to
the network.bindHost (assuming both use ipv4/ipv6).

-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes <gareth@betechnology.com.au

wrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in case
anyone else has the same problem:

network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts: storage1.example.com[9700],storage2.example.com
[9700]

On 30 March 2010 13:08, gareth stokes gareth@betechnology.com.au wrote:

Im having a lot of problems getting multiple nodes talking to each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy] Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my setup but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]


(Shay Banon) #5

Yes, very strange. I really hope to get around this when I upgrade to the
upcoming jgroups version (which is still in alpha stage, so I am waiting).

-shay.banon

On Wed, Mar 31, 2010 at 2:54 AM, Gareth Stokes
gareth@betechnology.com.auwrote:

I did think that was bizarre, and it was only happening on one machine in
the cluster so i can't exactly replicate the issue. just thought i would
document in case anyone else has the same problem.

On 30 March 2010 18:09, Shay Banon shay.banon@elasticsearch.com wrote:

I am not sure that the exception you posted on the first mail relates to
the updated configuration since the exception is from the netty layer (the
transport) and jgroups fix is for the discovery layer.

In general, you don't have to set the bind_addr, since it should default
to the network.bindHost (assuming both use ipv4/ipv6).

-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes <
gareth@betechnology.com.au> wrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in case
anyone else has the same problem:

network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts: storage1.example.com[9700],storage2.example.com
[9700]

On 30 March 2010 13:08, gareth stokes gareth@betechnology.com.auwrote:

Im having a lot of problems getting multiple nodes talking to each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy] Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my setup but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]


(alexandre gerlic) #6

Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

I tried your fix without success.

Do you have any update on this issue ?

Thx

2010/3/31 Shay Banon shay.banon@elasticsearch.com:

Yes, very strange. I really hope to get around this when I upgrade to the
upcoming jgroups version (which is still in alpha stage, so I am waiting).
-shay.banon

On Wed, Mar 31, 2010 at 2:54 AM, Gareth Stokes gareth@betechnology.com.au
wrote:

I did think that was bizarre, and it was only happening on one machine in
the cluster so i can't exactly replicate the issue. just thought i would
document in case anyone else has the same problem.

On 30 March 2010 18:09, Shay Banon shay.banon@elasticsearch.com wrote:

I am not sure that the exception you posted on the first mail relates to
the updated configuration since the exception is from the netty layer (the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it should default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes
gareth@betechnology.com.au wrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts:
storage1.example.com[9700],storage2.example.com[9700]

On 30 March 2010 13:08, gareth stokes gareth@betechnology.com.au
wrote:

Im having a lot of problems getting multiple nodes talking to each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy] Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my setup but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

--
Alexandre Gerlic


(Shay Banon) #7

Can you post the full stack trace? Are you running in an embedded mode?

cheers,
shay.banon

On Mon, Apr 19, 2010 at 2:45 AM, alexandre gerlic <
alexandre.gerlic@gmail.com> wrote:

Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

I tried your fix without success.

Do you have any update on this issue ?

Thx

2010/3/31 Shay Banon shay.banon@elasticsearch.com:

Yes, very strange. I really hope to get around this when I upgrade to the
upcoming jgroups version (which is still in alpha stage, so I am
waiting).
-shay.banon

On Wed, Mar 31, 2010 at 2:54 AM, Gareth Stokes <
gareth@betechnology.com.au>
wrote:

I did think that was bizarre, and it was only happening on one machine
in

the cluster so i can't exactly replicate the issue. just thought i would
document in case anyone else has the same problem.

On 30 March 2010 18:09, Shay Banon shay.banon@elasticsearch.com
wrote:

I am not sure that the exception you posted on the first mail relates
to

the updated configuration since the exception is from the netty layer
(the

transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it should
default

to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes
gareth@betechnology.com.au wrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in
case

anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts:
storage1.example.com[9700],storage2.example.com[9700]

On 30 March 2010 13:08, gareth stokes gareth@betechnology.com.au
wrote:

Im having a lot of problems getting multiple nodes talking to each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy] Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my setup but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

--
Alexandre Gerlic


(alexandre gerlic) #8

Hi, thx more informations below

stack : http://gist.github.com/371357

ip1 config file :
cluster:
name: clustername
network:
bindHost: ip1
publishHost: ip1
index.engine.robin.refreshInterval: -1
index.gateway.snapshot_interval: -1
index.gateway.fs.location: /path
index.gateway.type: fs
index.number_of_shards : 5
index.number_of_replicas : 1

index :
store:
fs:
memory:
enabled: true
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]

My first node is working on ubuntu 8.04 without problem
seconde one on ubuntu 9.10 throw this exception,

when I call :
http://ip1:9200/_cluster/nodes : only 1 node
http://ip2:9200/_cluster/nodes : 2 nodes

I checked firewall and seems ok.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

Can you post the full stack trace? Are you running in an embedded mode?
cheers,
shay.banon

On Mon, Apr 19, 2010 at 2:45 AM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

I tried your fix without success.

Do you have any update on this issue ?

Thx

2010/3/31 Shay Banon shay.banon@elasticsearch.com:

Yes, very strange. I really hope to get around this when I upgrade to
the
upcoming jgroups version (which is still in alpha stage, so I am
waiting).
-shay.banon

On Wed, Mar 31, 2010 at 2:54 AM, Gareth Stokes
gareth@betechnology.com.au
wrote:

I did think that was bizarre, and it was only happening on one machine
in
the cluster so i can't exactly replicate the issue. just thought i
would
document in case anyone else has the same problem.

On 30 March 2010 18:09, Shay Banon shay.banon@elasticsearch.com
wrote:

I am not sure that the exception you posted on the first mail relates
to
the updated configuration since the exception is from the netty layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes
gareth@betechnology.com.au wrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts:
storage1.example.com[9700],storage2.example.com[9700]

On 30 March 2010 13:08, gareth stokes gareth@betechnology.com.au
wrote:

Im having a lot of problems getting multiple nodes talking to each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy] Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my setup
but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

--
Alexandre Gerlic

--
Alexandre Gerlic


(Shay Banon) #9

It seems like the host you provide in the configuration happens can't be
resolved... . Something with the network configuration?

As a side note, it makes little sense to configure a gateway for an index,
and not defining a gateway for the whole cluster. You should configure the
gateway in the following manner:

gateway:
type: fs
fs.location: /path

This will cause any index created on the node to use the fs gateway
automatically.

The "top level" gateway is important since it stores all the cluster meta
data, such as indices created, mappings, and so on.

cheers,
shay.banon

On Mon, Apr 19, 2010 at 8:58 PM, alexandre gerlic <
alexandre.gerlic@gmail.com> wrote:

Hi, thx more informations below

stack : http://gist.github.com/371357

ip1 config file :
cluster:
name: clustername
network:
bindHost: ip1
publishHost: ip1
index.engine.robin.refreshInterval: -1
index.gateway.snapshot_interval: -1
index.gateway.fs.location: /path
index.gateway.type: fs
index.number_of_shards : 5
index.number_of_replicas : 1

index :
store:
fs:
memory:
enabled: true
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]

My first node is working on ubuntu 8.04 without problem
seconde one on ubuntu 9.10 throw this exception,

when I call :
http://ip1:9200/_cluster/nodes : only 1 node
http://ip2:9200/_cluster/nodes : 2 nodes

I checked firewall and seems ok.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

Can you post the full stack trace? Are you running in an embedded mode?
cheers,
shay.banon

On Mon, Apr 19, 2010 at 2:45 AM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

I tried your fix without success.

Do you have any update on this issue ?

Thx

2010/3/31 Shay Banon shay.banon@elasticsearch.com:

Yes, very strange. I really hope to get around this when I upgrade to
the
upcoming jgroups version (which is still in alpha stage, so I am
waiting).
-shay.banon

On Wed, Mar 31, 2010 at 2:54 AM, Gareth Stokes
gareth@betechnology.com.au
wrote:

I did think that was bizarre, and it was only happening on one
machine

in
the cluster so i can't exactly replicate the issue. just thought i
would
document in case anyone else has the same problem.

On 30 March 2010 18:09, Shay Banon shay.banon@elasticsearch.com
wrote:

I am not sure that the exception you posted on the first mail
relates

to
the updated configuration since the exception is from the netty
layer

(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes
gareth@betechnology.com.au wrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts:
storage1.example.com[9700],storage2.example.com[9700]

On 30 March 2010 13:08, gareth stokes gareth@betechnology.com.au
wrote:

Im having a lot of problems getting multiple nodes talking to each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy]
Exception

caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my setup
but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

--
Alexandre Gerlic

--
Alexandre Gerlic


(alexandre gerlic) #10

Thx, I updated my config file,
my stack was not complete I updated it :


I will continue to investigate, it is very strange, my networks seems good.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

It seems like the host you provide in the configuration happens can't be
resolved... . Something with the network configuration?
As a side note, it makes little sense to configure a gateway for an index,
and not defining a gateway for the whole cluster. You should configure the
gateway in the following manner:
gateway:
type: fs
fs.location: /path
This will cause any index created on the node to use the fs gateway
automatically.
The "top level" gateway is important since it stores all the cluster meta
data, such as indices created, mappings, and so on.
cheers,
shay.banon

On Mon, Apr 19, 2010 at 8:58 PM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi, thx more informations below

stack : http://gist.github.com/371357

ip1 config file :
cluster:
name: clustername
network:
bindHost: ip1
publishHost: ip1
index.engine.robin.refreshInterval: -1
index.gateway.snapshot_interval: -1
index.gateway.fs.location: /path
index.gateway.type: fs
index.number_of_shards : 5
index.number_of_replicas : 1

index :
store:
fs:
memory:
enabled: true
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]

My first node is working on ubuntu 8.04 without problem
seconde one on ubuntu 9.10 throw this exception,

when I call :
http://ip1:9200/_cluster/nodes : only 1 node
http://ip2:9200/_cluster/nodes : 2 nodes

I checked firewall and seems ok.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

Can you post the full stack trace? Are you running in an embedded mode?
cheers,
shay.banon

On Mon, Apr 19, 2010 at 2:45 AM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

I tried your fix without success.

Do you have any update on this issue ?

Thx

2010/3/31 Shay Banon shay.banon@elasticsearch.com:

Yes, very strange. I really hope to get around this when I upgrade to
the
upcoming jgroups version (which is still in alpha stage, so I am
waiting).
-shay.banon

On Wed, Mar 31, 2010 at 2:54 AM, Gareth Stokes
gareth@betechnology.com.au
wrote:

I did think that was bizarre, and it was only happening on one
machine
in
the cluster so i can't exactly replicate the issue. just thought i
would
document in case anyone else has the same problem.

On 30 March 2010 18:09, Shay Banon shay.banon@elasticsearch.com
wrote:

I am not sure that the exception you posted on the first mail
relates
to
the updated configuration since the exception is from the netty
layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes
gareth@betechnology.com.au wrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts:
storage1.example.com[9700],storage2.example.com[9700]

On 30 March 2010 13:08, gareth stokes gareth@betechnology.com.au
wrote:

Im having a lot of problems getting multiple nodes talking to
each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy]
Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my setup
but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic


(alexandre gerlic) #11

some more infos:
it seems to be the same issue than :

I downloaded last version and installed it on my bad node.

new error :
[21:48:36,042][WARN ][jgroups.pbcast.NAKACK ] host1-7963: dropped
message from host2-58528 (not in xmit_table), keys are [host1-7963],
view=[host1-7963|0] [host1-7963]

I tried to change sh script to add :
java.net.preferIPv4Stack=false
java.net.preferIPv6Stack=false

but a new error appeared :
[WARN ][jgroups.TCP] no physical address ....

it seems to be a jgroup issue.

2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:

Thx, I updated my config file,
my stack was not complete I updated it :
http://gist.github.com/371357
I will continue to investigate, it is very strange, my networks seems good.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

It seems like the host you provide in the configuration happens can't be
resolved... . Something with the network configuration?
As a side note, it makes little sense to configure a gateway for an index,
and not defining a gateway for the whole cluster. You should configure the
gateway in the following manner:
gateway:
type: fs
fs.location: /path
This will cause any index created on the node to use the fs gateway
automatically.
The "top level" gateway is important since it stores all the cluster meta
data, such as indices created, mappings, and so on.
cheers,
shay.banon

On Mon, Apr 19, 2010 at 8:58 PM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi, thx more informations below

stack : http://gist.github.com/371357

ip1 config file :
cluster:
name: clustername
network:
bindHost: ip1
publishHost: ip1
index.engine.robin.refreshInterval: -1
index.gateway.snapshot_interval: -1
index.gateway.fs.location: /path
index.gateway.type: fs
index.number_of_shards : 5
index.number_of_replicas : 1

index :
store:
fs:
memory:
enabled: true
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]

My first node is working on ubuntu 8.04 without problem
seconde one on ubuntu 9.10 throw this exception,

when I call :
http://ip1:9200/_cluster/nodes : only 1 node
http://ip2:9200/_cluster/nodes : 2 nodes

I checked firewall and seems ok.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

Can you post the full stack trace? Are you running in an embedded mode?
cheers,
shay.banon

On Mon, Apr 19, 2010 at 2:45 AM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

I tried your fix without success.

Do you have any update on this issue ?

Thx

2010/3/31 Shay Banon shay.banon@elasticsearch.com:

Yes, very strange. I really hope to get around this when I upgrade to
the
upcoming jgroups version (which is still in alpha stage, so I am
waiting).
-shay.banon

On Wed, Mar 31, 2010 at 2:54 AM, Gareth Stokes
gareth@betechnology.com.au
wrote:

I did think that was bizarre, and it was only happening on one
machine
in
the cluster so i can't exactly replicate the issue. just thought i
would
document in case anyone else has the same problem.

On 30 March 2010 18:09, Shay Banon shay.banon@elasticsearch.com
wrote:

I am not sure that the exception you posted on the first mail
relates
to
the updated configuration since the exception is from the netty
layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes
gareth@betechnology.com.au wrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts:
storage1.example.com[9700],storage2.example.com[9700]

On 30 March 2010 13:08, gareth stokes gareth@betechnology.com.au
wrote:

Im having a lot of problems getting multiple nodes talking to
each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy]
Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my setup
but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic


(alexandre gerlic) #12

ok finally I modified by hosts to add my node and it is now working.

2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:

some more infos:
it seems to be the same issue than :
http://github.com/elasticsearch/elasticsearch/issues/labels/bug#issue/40

I downloaded last version and installed it on my bad node.

new error :
[21:48:36,042][WARN ][jgroups.pbcast.NAKACK ] host1-7963: dropped
message from host2-58528 (not in xmit_table), keys are [host1-7963],
view=[host1-7963|0] [host1-7963]

I tried to change sh script to add :
java.net.preferIPv4Stack=false
java.net.preferIPv6Stack=false

but a new error appeared :
[WARN ][jgroups.TCP] no physical address ....

it seems to be a jgroup issue.

2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:

Thx, I updated my config file,
my stack was not complete I updated it :
http://gist.github.com/371357
I will continue to investigate, it is very strange, my networks seems good.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

It seems like the host you provide in the configuration happens can't be
resolved... . Something with the network configuration?
As a side note, it makes little sense to configure a gateway for an index,
and not defining a gateway for the whole cluster. You should configure the
gateway in the following manner:
gateway:
type: fs
fs.location: /path
This will cause any index created on the node to use the fs gateway
automatically.
The "top level" gateway is important since it stores all the cluster meta
data, such as indices created, mappings, and so on.
cheers,
shay.banon

On Mon, Apr 19, 2010 at 8:58 PM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi, thx more informations below

stack : http://gist.github.com/371357

ip1 config file :
cluster:
name: clustername
network:
bindHost: ip1
publishHost: ip1
index.engine.robin.refreshInterval: -1
index.gateway.snapshot_interval: -1
index.gateway.fs.location: /path
index.gateway.type: fs
index.number_of_shards : 5
index.number_of_replicas : 1

index :
store:
fs:
memory:
enabled: true
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]

My first node is working on ubuntu 8.04 without problem
seconde one on ubuntu 9.10 throw this exception,

when I call :
http://ip1:9200/_cluster/nodes : only 1 node
http://ip2:9200/_cluster/nodes : 2 nodes

I checked firewall and seems ok.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

Can you post the full stack trace? Are you running in an embedded mode?
cheers,
shay.banon

On Mon, Apr 19, 2010 at 2:45 AM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

I tried your fix without success.

Do you have any update on this issue ?

Thx

2010/3/31 Shay Banon shay.banon@elasticsearch.com:

Yes, very strange. I really hope to get around this when I upgrade to
the
upcoming jgroups version (which is still in alpha stage, so I am
waiting).
-shay.banon

On Wed, Mar 31, 2010 at 2:54 AM, Gareth Stokes
gareth@betechnology.com.au
wrote:

I did think that was bizarre, and it was only happening on one
machine
in
the cluster so i can't exactly replicate the issue. just thought i
would
document in case anyone else has the same problem.

On 30 March 2010 18:09, Shay Banon shay.banon@elasticsearch.com
wrote:

I am not sure that the exception you posted on the first mail
relates
to
the updated configuration since the exception is from the netty
layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes
gareth@betechnology.com.au wrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts:
storage1.example.com[9700],storage2.example.com[9700]

On 30 March 2010 13:08, gareth stokes gareth@betechnology.com.au
wrote:

Im having a lot of problems getting multiple nodes talking to
each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy]
Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my setup
but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic


(alexandre gerlic) #13

Too fast :
in fact it its working a while until I receiv :

host1 :
[13:33:20,054][WARN ][jgroups.FD ] I was suspected by
host2-28908; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:20,274][INFO ][cluster.service ] [Nighthawk] Master
{New [Nighthawk][host1-34919][data][inet[host1.domain1.com/ip1:9300]],
Previous [Anole][host2-28908][data][inet[sd-5175/ip2:9300]]}, Removed
{[Anole][host2-28908][data][inet[host2/ip2:9300]],}
Exception in thread "elasticsearch[Nighthawk][tp]-pool-1-thread-9"
java.lang.NullPointerException
at org.elasticsearch.transport.SendRequestTransportException.(SendRequestTransportException.java:30)
at org.elasticsearch.transport.TransportService$2.run(TransportService.java:152)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
[13:33:50,288][WARN ][jgroups.pbcast.NAKACK ] host1-34919: dropped
message from host2-28908 (not in xmit_table), keys are [host1-34919],
view=[host1-34919|2] [host1-34919]

host2:
[13:33:32,332][WARN ][jgroups.FD ] I was suspected by
host1-34919; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:32,335][WARN ][jgroups.pbcast.GMS ] host2-28908: not
member of view [host1-34919|2] [host1-34919]; discarding it
[13:33:32,601][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...
(received_address[org.elasticsearch.util.transport.DummyTransportAddress@e8d404)
[13:33:32,674][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...
(received_address[org.elasticsearch.util.transport.DummyTransportAddress@f278dd)
[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...
(received_address[org.elasticsearch.util.transport.DummyTransportAddress@dd3333)
[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...
(received_address[org.elasticsearch.util.transport.DummyTransportAddress@4c963c)

Everything is working fine (_cluster/nodes show 2nodes) until the
first error message then
_cluster/nodes/ show only 1 node
and nodes are not able to reconnects to the other one

2010/4/20 alexandre gerlic alexandre.gerlic@gmail.com:

ok finally I modified by hosts to add my node and it is now working.

2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:

some more infos:
it seems to be the same issue than :
http://github.com/elasticsearch/elasticsearch/issues/labels/bug#issue/40

I downloaded last version and installed it on my bad node.

new error :
[21:48:36,042][WARN ][jgroups.pbcast.NAKACK ] host1-7963: dropped
message from host2-58528 (not in xmit_table), keys are [host1-7963],
view=[host1-7963|0] [host1-7963]

I tried to change sh script to add :
java.net.preferIPv4Stack=false
java.net.preferIPv6Stack=false

but a new error appeared :
[WARN ][jgroups.TCP] no physical address ....

it seems to be a jgroup issue.

2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:

Thx, I updated my config file,
my stack was not complete I updated it :
http://gist.github.com/371357
I will continue to investigate, it is very strange, my networks seems good.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

It seems like the host you provide in the configuration happens can't be
resolved... . Something with the network configuration?
As a side note, it makes little sense to configure a gateway for an index,
and not defining a gateway for the whole cluster. You should configure the
gateway in the following manner:
gateway:
type: fs
fs.location: /path
This will cause any index created on the node to use the fs gateway
automatically.
The "top level" gateway is important since it stores all the cluster meta
data, such as indices created, mappings, and so on.
cheers,
shay.banon

On Mon, Apr 19, 2010 at 8:58 PM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi, thx more informations below

stack : http://gist.github.com/371357

ip1 config file :
cluster:
name: clustername
network:
bindHost: ip1
publishHost: ip1
index.engine.robin.refreshInterval: -1
index.gateway.snapshot_interval: -1
index.gateway.fs.location: /path
index.gateway.type: fs
index.number_of_shards : 5
index.number_of_replicas : 1

index :
store:
fs:
memory:
enabled: true
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]

My first node is working on ubuntu 8.04 without problem
seconde one on ubuntu 9.10 throw this exception,

when I call :
http://ip1:9200/_cluster/nodes : only 1 node
http://ip2:9200/_cluster/nodes : 2 nodes

I checked firewall and seems ok.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

Can you post the full stack trace? Are you running in an embedded mode?
cheers,
shay.banon

On Mon, Apr 19, 2010 at 2:45 AM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

I tried your fix without success.

Do you have any update on this issue ?

Thx

2010/3/31 Shay Banon shay.banon@elasticsearch.com:

Yes, very strange. I really hope to get around this when I upgrade to
the
upcoming jgroups version (which is still in alpha stage, so I am
waiting).
-shay.banon

On Wed, Mar 31, 2010 at 2:54 AM, Gareth Stokes
gareth@betechnology.com.au
wrote:

I did think that was bizarre, and it was only happening on one
machine
in
the cluster so i can't exactly replicate the issue. just thought i
would
document in case anyone else has the same problem.

On 30 March 2010 18:09, Shay Banon shay.banon@elasticsearch.com
wrote:

I am not sure that the exception you posted on the first mail
relates
to
the updated configuration since the exception is from the netty
layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes
gareth@betechnology.com.au wrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts:
storage1.example.com[9700],storage2.example.com[9700]

On 30 March 2010 13:08, gareth stokes gareth@betechnology.com.au
wrote:

Im having a lot of problems getting multiple nodes talking to
each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy]
Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my setup
but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic


(Shay Banon) #14

This is a very strange behavior that I get with jgroups sometimes and still
have not managed to recreate it. I am working on a workaround for this.
Remind me, are you using udp or tcp with jgroups?

cheers,
shay.banon

On Wed, Apr 21, 2010 at 4:13 PM, alexandre gerlic <
alexandre.gerlic@gmail.com> wrote:

Too fast :
in fact it its working a while until I receiv :

host1 :
[13:33:20,054][WARN ][jgroups.FD ] I was suspected by
host2-28908; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:20,274][INFO ][cluster.service ] [Nighthawk] Master
{New [Nighthawk][host1-34919][data][inet[host1.domain1.com/ip1:9300]],
Previous [Anole][host2-28908][data][inet[sd-5175/ip2:9300]]}, Removed
{[Anole][host2-28908][data][inet[host2/ip2:9300]],}
Exception in thread "elasticsearch[Nighthawk][tp]-pool-1-thread-9"
java.lang.NullPointerException
at
org.elasticsearch.transport.SendRequestTransportException.(SendRequestTransportException.java:30)
at
org.elasticsearch.transport.TransportService$2.run(TransportService.java:152)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
[13:33:50,288][WARN ][jgroups.pbcast.NAKACK ] host1-34919: dropped
message from host2-28908 (not in xmit_table), keys are [host1-34919],
view=[host1-34919|2] [host1-34919]

host2:
[13:33:32,332][WARN ][jgroups.FD ] I was suspected by
host1-34919; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:32,335][WARN ][jgroups.pbcast.GMS ] host2-28908: not
member of view [host1-34919|2] [host1-34919]; discarding it
[13:33:32,601][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@e8d404
)
[13:33:32,674][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@f278dd
)
[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@dd3333
)
[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@4c963c
)

Everything is working fine (_cluster/nodes show 2nodes) until the
first error message then
_cluster/nodes/ show only 1 node
and nodes are not able to reconnects to the other one

2010/4/20 alexandre gerlic alexandre.gerlic@gmail.com:

ok finally I modified by hosts to add my node and it is now working.

2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:

some more infos:
it seems to be the same issue than :

http://github.com/elasticsearch/elasticsearch/issues/labels/bug#issue/40

I downloaded last version and installed it on my bad node.

new error :
[21:48:36,042][WARN ][jgroups.pbcast.NAKACK ] host1-7963: dropped
message from host2-58528 (not in xmit_table), keys are [host1-7963],
view=[host1-7963|0] [host1-7963]

I tried to change sh script to add :
java.net.preferIPv4Stack=false
java.net.preferIPv6Stack=false

but a new error appeared :
[WARN ][jgroups.TCP] no physical address ....

it seems to be a jgroup issue.

2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:

Thx, I updated my config file,
my stack was not complete I updated it :
http://gist.github.com/371357
I will continue to investigate, it is very strange, my networks seems
good.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

It seems like the host you provide in the configuration happens can't
be

resolved... . Something with the network configuration?
As a side note, it makes little sense to configure a gateway for an
index,

and not defining a gateway for the whole cluster. You should configure
the

gateway in the following manner:
gateway:
type: fs
fs.location: /path
This will cause any index created on the node to use the fs gateway
automatically.
The "top level" gateway is important since it stores all the cluster
meta

data, such as indices created, mappings, and so on.
cheers,
shay.banon

On Mon, Apr 19, 2010 at 8:58 PM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi, thx more informations below

stack : http://gist.github.com/371357

ip1 config file :
cluster:
name: clustername
network:
bindHost: ip1
publishHost: ip1
index.engine.robin.refreshInterval: -1
index.gateway.snapshot_interval: -1
index.gateway.fs.location: /path
index.gateway.type: fs
index.number_of_shards : 5
index.number_of_replicas : 1

index :
store:
fs:
memory:
enabled: true
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]

My first node is working on ubuntu 8.04 without problem
seconde one on ubuntu 9.10 throw this exception,

when I call :
http://ip1:9200/_cluster/nodes : only 1 node
http://ip2:9200/_cluster/nodes : 2 nodes

I checked firewall and seems ok.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

Can you post the full stack trace? Are you running in an embedded
mode?

cheers,
shay.banon

On Mon, Apr 19, 2010 at 2:45 AM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

I tried your fix without success.

Do you have any update on this issue ?

Thx

2010/3/31 Shay Banon shay.banon@elasticsearch.com:

Yes, very strange. I really hope to get around this when I
upgrade to

the
upcoming jgroups version (which is still in alpha stage, so I am
waiting).
-shay.banon

On Wed, Mar 31, 2010 at 2:54 AM, Gareth Stokes
gareth@betechnology.com.au
wrote:

I did think that was bizarre, and it was only happening on one
machine
in
the cluster so i can't exactly replicate the issue. just
thought i

would
document in case anyone else has the same problem.

On 30 March 2010 18:09, Shay Banon <
shay.banon@elasticsearch.com>

wrote:

I am not sure that the exception you posted on the first mail
relates
to
the updated configuration since the exception is from the
netty

layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it
should

default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes
gareth@betechnology.com.au wrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that
worked in

case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts:
storage1.example.com[9700],storage2.example.com[9700]

On 30 March 2010 13:08, gareth stokes <
gareth@betechnology.com.au>

wrote:

Im having a lot of problems getting multiple nodes talking
to

each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy]
Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my
setup

but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic


(alexandre gerlic) #15

I am using tcp and binary release : 0.6.0
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]

thx for your help

2010/4/21 Shay Banon shay.banon@elasticsearch.com:

This is a very strange behavior that I get with jgroups sometimes and still
have not managed to recreate it. I am working on a workaround for this.
Remind me, are you using udp or tcp with jgroups?
cheers,
shay.banon

On Wed, Apr 21, 2010 at 4:13 PM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Too fast :
in fact it its working a while until I receiv :

host1 :
[13:33:20,054][WARN ][jgroups.FD ] I was suspected by
host2-28908; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:20,274][INFO ][cluster.service ] [Nighthawk] Master
{New [Nighthawk][host1-34919][data][inet[host1.domain1.com/ip1:9300]],
Previous [Anole][host2-28908][data][inet[sd-5175/ip2:9300]]}, Removed
{[Anole][host2-28908][data][inet[host2/ip2:9300]],}
Exception in thread "elasticsearch[Nighthawk][tp]-pool-1-thread-9"
java.lang.NullPointerException
at
org.elasticsearch.transport.SendRequestTransportException.(SendRequestTransportException.java:30)
at
org.elasticsearch.transport.TransportService$2.run(TransportService.java:152)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
[13:33:50,288][WARN ][jgroups.pbcast.NAKACK ] host1-34919: dropped
message from host2-28908 (not in xmit_table), keys are [host1-34919],
view=[host1-34919|2] [host1-34919]

host2:
[13:33:32,332][WARN ][jgroups.FD ] I was suspected by
host1-34919; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:32,335][WARN ][jgroups.pbcast.GMS ] host2-28908: not
member of view [host1-34919|2] [host1-34919]; discarding it
[13:33:32,601][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@e8d404)
[13:33:32,674][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@f278dd)
[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@dd3333)
[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@4c963c)

Everything is working fine (_cluster/nodes show 2nodes) until the
first error message then
_cluster/nodes/ show only 1 node
and nodes are not able to reconnects to the other one

2010/4/20 alexandre gerlic alexandre.gerlic@gmail.com:

ok finally I modified by hosts to add my node and it is now working.

2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:

some more infos:
it seems to be the same issue than :

http://github.com/elasticsearch/elasticsearch/issues/labels/bug#issue/40

I downloaded last version and installed it on my bad node.

new error :
[21:48:36,042][WARN ][jgroups.pbcast.NAKACK ] host1-7963: dropped
message from host2-58528 (not in xmit_table), keys are [host1-7963],
view=[host1-7963|0] [host1-7963]

I tried to change sh script to add :
java.net.preferIPv4Stack=false
java.net.preferIPv6Stack=false

but a new error appeared :
[WARN ][jgroups.TCP] no physical address ....

it seems to be a jgroup issue.

2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:

Thx, I updated my config file,
my stack was not complete I updated it :
http://gist.github.com/371357
I will continue to investigate, it is very strange, my networks seems
good.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

It seems like the host you provide in the configuration happens can't
be
resolved... . Something with the network configuration?
As a side note, it makes little sense to configure a gateway for an
index,
and not defining a gateway for the whole cluster. You should
configure the
gateway in the following manner:
gateway:
type: fs
fs.location: /path
This will cause any index created on the node to use the fs gateway
automatically.
The "top level" gateway is important since it stores all the cluster
meta
data, such as indices created, mappings, and so on.
cheers,
shay.banon

On Mon, Apr 19, 2010 at 8:58 PM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi, thx more informations below

stack : http://gist.github.com/371357

ip1 config file :
cluster:
name: clustername
network:
bindHost: ip1
publishHost: ip1
index.engine.robin.refreshInterval: -1
index.gateway.snapshot_interval: -1
index.gateway.fs.location: /path
index.gateway.type: fs
index.number_of_shards : 5
index.number_of_replicas : 1

index :
store:
fs:
memory:
enabled: true
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]

My first node is working on ubuntu 8.04 without problem
seconde one on ubuntu 9.10 throw this exception,

when I call :
http://ip1:9200/_cluster/nodes : only 1 node
http://ip2:9200/_cluster/nodes : 2 nodes

I checked firewall and seems ok.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

Can you post the full stack trace? Are you running in an embedded
mode?
cheers,
shay.banon

On Mon, Apr 19, 2010 at 2:45 AM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

I tried your fix without success.

Do you have any update on this issue ?

Thx

2010/3/31 Shay Banon shay.banon@elasticsearch.com:

Yes, very strange. I really hope to get around this when I
upgrade to
the
upcoming jgroups version (which is still in alpha stage, so I
am
waiting).
-shay.banon

On Wed, Mar 31, 2010 at 2:54 AM, Gareth Stokes
gareth@betechnology.com.au
wrote:

I did think that was bizarre, and it was only happening on one
machine
in
the cluster so i can't exactly replicate the issue. just
thought i
would
document in case anyone else has the same problem.

On 30 March 2010 18:09, Shay Banon
shay.banon@elasticsearch.com
wrote:

I am not sure that the exception you posted on the first mail
relates
to
the updated configuration since the exception is from the
netty
layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it
should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes
gareth@betechnology.com.au wrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that
worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts:
storage1.example.com[9700],storage2.example.com[9700]

On 30 March 2010 13:08, gareth stokes
gareth@betechnology.com.au
wrote:

Im having a lot of problems getting multiple nodes talking
to
each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy]
Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my
setup
but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic


(alexandre gerlic) #16

Ok Finally : after node discovery crash, I stopped my first node
creating gateway fs files during shutdown.
Then I restarted It.
And it seems working my 2 nodes are still connected after 1 day.

2010/4/21 alexandre gerlic alexandre.gerlic@gmail.com:

I am using tcp and binary release : 0.6.0
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]

thx for your help

2010/4/21 Shay Banon shay.banon@elasticsearch.com:

This is a very strange behavior that I get with jgroups sometimes and still
have not managed to recreate it. I am working on a workaround for this.
Remind me, are you using udp or tcp with jgroups?
cheers,
shay.banon

On Wed, Apr 21, 2010 at 4:13 PM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Too fast :
in fact it its working a while until I receiv :

host1 :
[13:33:20,054][WARN ][jgroups.FD ] I was suspected by
host2-28908; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:20,274][INFO ][cluster.service ] [Nighthawk] Master
{New [Nighthawk][host1-34919][data][inet[host1.domain1.com/ip1:9300]],
Previous [Anole][host2-28908][data][inet[sd-5175/ip2:9300]]}, Removed
{[Anole][host2-28908][data][inet[host2/ip2:9300]],}
Exception in thread "elasticsearch[Nighthawk][tp]-pool-1-thread-9"
java.lang.NullPointerException
at
org.elasticsearch.transport.SendRequestTransportException.(SendRequestTransportException.java:30)
at
org.elasticsearch.transport.TransportService$2.run(TransportService.java:152)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
[13:33:50,288][WARN ][jgroups.pbcast.NAKACK ] host1-34919: dropped
message from host2-28908 (not in xmit_table), keys are [host1-34919],
view=[host1-34919|2] [host1-34919]

host2:
[13:33:32,332][WARN ][jgroups.FD ] I was suspected by
host1-34919; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:32,335][WARN ][jgroups.pbcast.GMS ] host2-28908: not
member of view [host1-34919|2] [host1-34919]; discarding it
[13:33:32,601][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@e8d404)
[13:33:32,674][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@f278dd)
[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@dd3333)
[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@4c963c)

Everything is working fine (_cluster/nodes show 2nodes) until the
first error message then
_cluster/nodes/ show only 1 node
and nodes are not able to reconnects to the other one

2010/4/20 alexandre gerlic alexandre.gerlic@gmail.com:

ok finally I modified by hosts to add my node and it is now working.

2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:

some more infos:
it seems to be the same issue than :

http://github.com/elasticsearch/elasticsearch/issues/labels/bug#issue/40

I downloaded last version and installed it on my bad node.

new error :
[21:48:36,042][WARN ][jgroups.pbcast.NAKACK ] host1-7963: dropped
message from host2-58528 (not in xmit_table), keys are [host1-7963],
view=[host1-7963|0] [host1-7963]

I tried to change sh script to add :
java.net.preferIPv4Stack=false
java.net.preferIPv6Stack=false

but a new error appeared :
[WARN ][jgroups.TCP] no physical address ....

it seems to be a jgroup issue.

2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:

Thx, I updated my config file,
my stack was not complete I updated it :
http://gist.github.com/371357
I will continue to investigate, it is very strange, my networks seems
good.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

It seems like the host you provide in the configuration happens can't
be
resolved... . Something with the network configuration?
As a side note, it makes little sense to configure a gateway for an
index,
and not defining a gateway for the whole cluster. You should
configure the
gateway in the following manner:
gateway:
type: fs
fs.location: /path
This will cause any index created on the node to use the fs gateway
automatically.
The "top level" gateway is important since it stores all the cluster
meta
data, such as indices created, mappings, and so on.
cheers,
shay.banon

On Mon, Apr 19, 2010 at 8:58 PM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi, thx more informations below

stack : http://gist.github.com/371357

ip1 config file :
cluster:
name: clustername
network:
bindHost: ip1
publishHost: ip1
index.engine.robin.refreshInterval: -1
index.gateway.snapshot_interval: -1
index.gateway.fs.location: /path
index.gateway.type: fs
index.number_of_shards : 5
index.number_of_replicas : 1

index :
store:
fs:
memory:
enabled: true
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]

My first node is working on ubuntu 8.04 without problem
seconde one on ubuntu 9.10 throw this exception,

when I call :
http://ip1:9200/_cluster/nodes : only 1 node
http://ip2:9200/_cluster/nodes : 2 nodes

I checked firewall and seems ok.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

Can you post the full stack trace? Are you running in an embedded
mode?
cheers,
shay.banon

On Mon, Apr 19, 2010 at 2:45 AM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

I tried your fix without success.

Do you have any update on this issue ?

Thx

2010/3/31 Shay Banon shay.banon@elasticsearch.com:

Yes, very strange. I really hope to get around this when I
upgrade to
the
upcoming jgroups version (which is still in alpha stage, so I
am
waiting).
-shay.banon

On Wed, Mar 31, 2010 at 2:54 AM, Gareth Stokes
gareth@betechnology.com.au
wrote:

I did think that was bizarre, and it was only happening on one
machine
in
the cluster so i can't exactly replicate the issue. just
thought i
would
document in case anyone else has the same problem.

On 30 March 2010 18:09, Shay Banon
shay.banon@elasticsearch.com
wrote:

I am not sure that the exception you posted on the first mail
relates
to
the updated configuration since the exception is from the
netty
layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it
should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes
gareth@betechnology.com.au wrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that
worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts:
storage1.example.com[9700],storage2.example.com[9700]

On 30 March 2010 13:08, gareth stokes
gareth@betechnology.com.au
wrote:

Im having a lot of problems getting multiple nodes talking
to
each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ] [Alchemy]
Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured my
setup
but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic


(Shay Banon) #17

Hi,

Great that things are finally working for you!. I am working on a new
discovery module that will replace the jgroups one which I hope will
eliminate the problems you were facing...

cheers,
shay.banon

On Thu, Apr 22, 2010 at 12:19 PM, alexandre gerlic <
alexandre.gerlic@gmail.com> wrote:

Ok Finally : after node discovery crash, I stopped my first node
creating gateway fs files during shutdown.
Then I restarted It.
And it seems working my 2 nodes are still connected after 1 day.

2010/4/21 alexandre gerlic alexandre.gerlic@gmail.com:

I am using tcp and binary release : 0.6.0
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]

thx for your help

2010/4/21 Shay Banon shay.banon@elasticsearch.com:

This is a very strange behavior that I get with jgroups sometimes and
still

have not managed to recreate it. I am working on a workaround for this.
Remind me, are you using udp or tcp with jgroups?
cheers,
shay.banon

On Wed, Apr 21, 2010 at 4:13 PM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Too fast :
in fact it its working a while until I receiv :

host1 :
[13:33:20,054][WARN ][jgroups.FD ] I was suspected by
host2-28908; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:20,274][INFO ][cluster.service ] [Nighthawk] Master
{New [Nighthawk][host1-34919][data][inet[host1.domain1.com/ip1:9300]],
Previous [Anole][host2-28908][data][inet[sd-5175/ip2:9300]]}, Removed
{[Anole][host2-28908][data][inet[host2/ip2:9300]],}
Exception in thread "elasticsearch[Nighthawk][tp]-pool-1-thread-9"
java.lang.NullPointerException
at

org.elasticsearch.transport.SendRequestTransportException.(SendRequestTransportException.java:30)

   at

org.elasticsearch.transport.TransportService$2.run(TransportService.java:152)

   at

java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

   at

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

   at java.lang.Thread.run(Thread.java:619)

[13:33:50,288][WARN ][jgroups.pbcast.NAKACK ] host1-34919: dropped
message from host2-28908 (not in xmit_table), keys are [host1-34919],
view=[host1-34919|2] [host1-34919]

host2:
[13:33:32,332][WARN ][jgroups.FD ] I was suspected by
host1-34919; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:32,335][WARN ][jgroups.pbcast.GMS ] host2-28908: not
member of view [host1-34919|2] [host1-34919]; discarding it
[13:33:32,601][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@e8d404
)

[13:33:32,674][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@f278dd
)

[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@dd3333
)

[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...

(received_address[org.elasticsearch.util.transport.DummyTransportAddress@4c963c
)

Everything is working fine (_cluster/nodes show 2nodes) until the
first error message then
_cluster/nodes/ show only 1 node
and nodes are not able to reconnects to the other one

2010/4/20 alexandre gerlic alexandre.gerlic@gmail.com:

ok finally I modified by hosts to add my node and it is now working.

2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:

some more infos:
it seems to be the same issue than :

http://github.com/elasticsearch/elasticsearch/issues/labels/bug#issue/40

I downloaded last version and installed it on my bad node.

new error :
[21:48:36,042][WARN ][jgroups.pbcast.NAKACK ] host1-7963: dropped
message from host2-58528 (not in xmit_table), keys are [host1-7963],
view=[host1-7963|0] [host1-7963]

I tried to change sh script to add :
java.net.preferIPv4Stack=false
java.net.preferIPv6Stack=false

but a new error appeared :
[WARN ][jgroups.TCP] no physical address ....

it seems to be a jgroup issue.

2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:

Thx, I updated my config file,
my stack was not complete I updated it :
http://gist.github.com/371357
I will continue to investigate, it is very strange, my networks
seems

good.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

It seems like the host you provide in the configuration happens
can't

be
resolved... . Something with the network configuration?
As a side note, it makes little sense to configure a gateway for
an

index,
and not defining a gateway for the whole cluster. You should
configure the
gateway in the following manner:
gateway:
type: fs
fs.location: /path
This will cause any index created on the node to use the fs
gateway

automatically.
The "top level" gateway is important since it stores all the
cluster

meta
data, such as indices created, mappings, and so on.
cheers,
shay.banon

On Mon, Apr 19, 2010 at 8:58 PM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi, thx more informations below

stack : http://gist.github.com/371357

ip1 config file :
cluster:
name: clustername
network:
bindHost: ip1
publishHost: ip1
index.engine.robin.refreshInterval: -1
index.gateway.snapshot_interval: -1
index.gateway.fs.location: /path
index.gateway.type: fs
index.number_of_shards : 5
index.number_of_replicas : 1

index :
store:
fs:
memory:
enabled: true
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]

My first node is working on ubuntu 8.04 without problem
seconde one on ubuntu 9.10 throw this exception,

when I call :
http://ip1:9200/_cluster/nodes : only 1 node
http://ip2:9200/_cluster/nodes : 2 nodes

I checked firewall and seems ok.

2010/4/19 Shay Banon shay.banon@elasticsearch.com:

Can you post the full stack trace? Are you running in an
embedded

mode?
cheers,
shay.banon

On Mon, Apr 19, 2010 at 2:45 AM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:

Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

I tried your fix without success.

Do you have any update on this issue ?

Thx

2010/3/31 Shay Banon shay.banon@elasticsearch.com:

Yes, very strange. I really hope to get around this when I
upgrade to
the
upcoming jgroups version (which is still in alpha stage, so
I

am
waiting).
-shay.banon

On Wed, Mar 31, 2010 at 2:54 AM, Gareth Stokes
gareth@betechnology.com.au
wrote:

I did think that was bizarre, and it was only happening on
one

machine
in
the cluster so i can't exactly replicate the issue. just
thought i
would
document in case anyone else has the same problem.

On 30 March 2010 18:09, Shay Banon
shay.banon@elasticsearch.com
wrote:

I am not sure that the exception you posted on the first
mail

relates
to
the updated configuration since the exception is from the
netty
layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it
should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon

On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes
gareth@betechnology.com.au wrote:

found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that
worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts:
storage1.example.com[9700],storage2.example.com[9700]

On 30 March 2010 13:08, gareth stokes
gareth@betechnology.com.au
wrote:

Im having a lot of problems getting multiple nodes
talking

to
each
other, for some reason netty keeps on giving me errors.

[01:57:20,724][WARN ][transport.netty ]
[Alchemy]

Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)

now i'm sure this has to do with the way i've configured
my

setup
but
for the life of me i can't see what im missing??
this is my config file

network :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic

--
Alexandre Gerlic


(Sergio Bossa) #18

On Fri, Apr 23, 2010 at 5:34 PM, Shay Banon
shay.banon@elasticsearch.com wrote:

Great that things are finally working for you!. I am working on a new
discovery module that will replace the jgroups one which I hope will
eliminate the problems you were facing...

Sounds great: what are you using for the new discovery module? Or is
it completely written from scratch?

--
Sergio Bossa
http://www.linkedin.com/in/sergiob


(Shay Banon) #19

Completely from scratch, utilizing the built in components in elasticsearch
(like the transport). Also, trying to build one that has pluggable support
for the "cloud" (more on that later...).

On Fri, Apr 23, 2010 at 7:26 PM, Sergio Bossa sergio.bossa@gmail.comwrote:

On Fri, Apr 23, 2010 at 5:34 PM, Shay Banon
shay.banon@elasticsearch.com wrote:

Great that things are finally working for you!. I am working on a new
discovery module that will replace the jgroups one which I hope will
eliminate the problems you were facing...

Sounds great: what are you using for the new discovery module? Or is
it completely written from scratch?

--
Sergio Bossa
http://www.linkedin.com/in/sergiob


(system) #20