found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in case
anyone else has the same problem:
I am not sure that the exception you posted on the first mail relates to the
updated configuration since the exception is from the netty layer (the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it should default to
the network.bindHost (assuming both use ipv4/ipv6).
found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in case
anyone else has the same problem:
I did think that was bizarre, and it was only happening on one machine in
the cluster so i can't exactly replicate the issue. just thought i would
document in case anyone else has the same problem.
I am not sure that the exception you posted on the first mail relates to
the updated configuration since the exception is from the netty layer (the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it should default to
the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon
On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes <gareth@betechnology.com.au
wrote:
found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in case
anyone else has the same problem:
I did think that was bizarre, and it was only happening on one machine in
the cluster so i can't exactly replicate the issue. just thought i would
document in case anyone else has the same problem.
I am not sure that the exception you posted on the first mail relates to
the updated configuration since the exception is from the netty layer (the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it should default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon
On Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes <
gareth@betechnology.com.au> wrote:
found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in case
anyone else has the same problem:
Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)
Yes, very strange. I really hope to get around this when I upgrade to the
upcoming jgroups version (which is still in alpha stage, so I am waiting).
-shay.banon
I did think that was bizarre, and it was only happening on one machine in
the cluster so i can't exactly replicate the issue. just thought i would
document in case anyone else has the same problem.
I am not sure that the exception you posted on the first mail relates to
the updated configuration since the exception is from the netty layer (the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it should default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon
found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts: storage1.example.com[9700],storage2.example.com[9700]
Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)
Yes, very strange. I really hope to get around this when I upgrade to the
upcoming jgroups version (which is still in alpha stage, so I am
waiting).
-shay.banon
On Wed, Mar 31, 2010 at 2:54 AM, Gareth Stokes <
gareth@betechnology.com.au>
wrote:
I did think that was bizarre, and it was only happening on one machine
in
the cluster so i can't exactly replicate the issue. just thought i would
document in case anyone else has the same problem.
I am not sure that the exception you posted on the first mail relates
to
the updated configuration since the exception is from the netty layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon
found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts: storage1.example.com[9700],storage2.example.com[9700]
Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)
Yes, very strange. I really hope to get around this when I upgrade to
the
upcoming jgroups version (which is still in alpha stage, so I am
waiting).
-shay.banon
I did think that was bizarre, and it was only happening on one machine
in
the cluster so i can't exactly replicate the issue. just thought i
would
document in case anyone else has the same problem.
I am not sure that the exception you posted on the first mail relates
to
the updated configuration since the exception is from the netty layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon
found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts: storage1.example.com[9700],storage2.example.com[9700]
It seems like the host you provide in the configuration happens can't be
resolved... . Something with the network configuration?
As a side note, it makes little sense to configure a gateway for an index,
and not defining a gateway for the whole cluster. You should configure the
gateway in the following manner:
gateway:
type: fs
fs.location: /path
This will cause any index created on the node to use the fs gateway
automatically.
The "top level" gateway is important since it stores all the cluster meta
data, such as indices created, mappings, and so on.
Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)
Yes, very strange. I really hope to get around this when I upgrade to
the
upcoming jgroups version (which is still in alpha stage, so I am
waiting).
-shay.banon
I did think that was bizarre, and it was only happening on one
machine
in
the cluster so i can't exactly replicate the issue. just thought i
would
document in case anyone else has the same problem.
I am not sure that the exception you posted on the first mail
relates
to
the updated configuration since the exception is from the netty
layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon
found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts: storage1.example.com[9700],storage2.example.com[9700]
It seems like the host you provide in the configuration happens can't be
resolved... . Something with the network configuration?
As a side note, it makes little sense to configure a gateway for an index,
and not defining a gateway for the whole cluster. You should configure the
gateway in the following manner:
gateway:
type: fs
fs.location: /path
This will cause any index created on the node to use the fs gateway
automatically.
The "top level" gateway is important since it stores all the cluster meta
data, such as indices created, mappings, and so on.
cheers,
shay.banon
Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)
Yes, very strange. I really hope to get around this when I upgrade to
the
upcoming jgroups version (which is still in alpha stage, so I am
waiting).
-shay.banon
I did think that was bizarre, and it was only happening on one
machine
in
the cluster so i can't exactly replicate the issue. just thought i
would
document in case anyone else has the same problem.
I am not sure that the exception you posted on the first mail
relates
to
the updated configuration since the exception is from the netty
layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon
found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts: storage1.example.com[9700],storage2.example.com[9700]
I downloaded last version and installed it on my bad node.
new error :
[21:48:36,042][WARN ][jgroups.pbcast.NAKACK ] host1-7963: dropped
message from host2-58528 (not in xmit_table), keys are [host1-7963],
view=[host1-7963|0] [host1-7963]
I tried to change sh script to add :
java.net.preferIPv4Stack=false
java.net.preferIPv6Stack=false
but a new error appeared :
[WARN ][jgroups.TCP] no physical address ....
Thx, I updated my config file,
my stack was not complete I updated it : 371357’s gists · GitHub
I will continue to investigate, it is very strange, my networks seems good.
It seems like the host you provide in the configuration happens can't be
resolved... . Something with the network configuration?
As a side note, it makes little sense to configure a gateway for an index,
and not defining a gateway for the whole cluster. You should configure the
gateway in the following manner:
gateway:
type: fs
fs.location: /path
This will cause any index created on the node to use the fs gateway
automatically.
The "top level" gateway is important since it stores all the cluster meta
data, such as indices created, mappings, and so on.
cheers,
shay.banon
Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)
Yes, very strange. I really hope to get around this when I upgrade to
the
upcoming jgroups version (which is still in alpha stage, so I am
waiting).
-shay.banon
I did think that was bizarre, and it was only happening on one
machine
in
the cluster so i can't exactly replicate the issue. just thought i
would
document in case anyone else has the same problem.
I am not sure that the exception you posted on the first mail
relates
to
the updated configuration since the exception is from the netty
layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon
found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts: storage1.example.com[9700],storage2.example.com[9700]
I downloaded last version and installed it on my bad node.
new error :
[21:48:36,042][WARN ][jgroups.pbcast.NAKACK ] host1-7963: dropped
message from host2-58528 (not in xmit_table), keys are [host1-7963],
view=[host1-7963|0] [host1-7963]
I tried to change sh script to add :
java.net.preferIPv4Stack=false
java.net.preferIPv6Stack=false
but a new error appeared :
[WARN ][jgroups.TCP] no physical address ....
Thx, I updated my config file,
my stack was not complete I updated it : 371357’s gists · GitHub
I will continue to investigate, it is very strange, my networks seems good.
It seems like the host you provide in the configuration happens can't be
resolved... . Something with the network configuration?
As a side note, it makes little sense to configure a gateway for an index,
and not defining a gateway for the whole cluster. You should configure the
gateway in the following manner:
gateway:
type: fs
fs.location: /path
This will cause any index created on the node to use the fs gateway
automatically.
The "top level" gateway is important since it stores all the cluster meta
data, such as indices created, mappings, and so on.
cheers,
shay.banon
Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)
Yes, very strange. I really hope to get around this when I upgrade to
the
upcoming jgroups version (which is still in alpha stage, so I am
waiting).
-shay.banon
I did think that was bizarre, and it was only happening on one
machine
in
the cluster so i can't exactly replicate the issue. just thought i
would
document in case anyone else has the same problem.
I am not sure that the exception you posted on the first mail
relates
to
the updated configuration since the exception is from the netty
layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon
found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts: storage1.example.com[9700],storage2.example.com[9700]
Too fast :
in fact it its working a while until I receiv :
host1 :
[13:33:20,054][WARN ][jgroups.FD ] I was suspected by
host2-28908; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:20,274][INFO ][cluster.service ] [Nighthawk] Master
{New [Nighthawk][host1-34919][data][inet[host1.domain1.com/ip1:9300]],
Previous [Anole][host2-28908][data][inet[sd-5175/ip2:9300]]}, Removed
{[Anole][host2-28908][data][inet[host2/ip2:9300]],}
Exception in thread "elasticsearch[Nighthawk][tp]-pool-1-thread-9"
java.lang.NullPointerException
at org.elasticsearch.transport.SendRequestTransportException.(SendRequestTransportException.java:30)
at org.elasticsearch.transport.TransportService$2.run(TransportService.java:152)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
[13:33:50,288][WARN ][jgroups.pbcast.NAKACK ] host1-34919: dropped
message from host2-28908 (not in xmit_table), keys are [host1-34919],
view=[host1-34919|2] [host1-34919]
host2:
[13:33:32,332][WARN ][jgroups.FD ] I was suspected by
host1-34919; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:32,335][WARN ][jgroups.pbcast.GMS ] host2-28908: not
member of view [host1-34919|2] [host1-34919]; discarding it
[13:33:32,601][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...
(received_address[org.elasticsearch.util.transport.DummyTransportAddress@e8d404)
[13:33:32,674][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...
(received_address[org.elasticsearch.util.transport.DummyTransportAddress@f278dd)
[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...
(received_address[org.elasticsearch.util.transport.DummyTransportAddress@dd3333)
[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...
(received_address[org.elasticsearch.util.transport.DummyTransportAddress@4c963c)
Everything is working fine (_cluster/nodes show 2nodes) until the
first error message then
_cluster/nodes/ show only 1 node
and nodes are not able to reconnects to the other one
I downloaded last version and installed it on my bad node.
new error :
[21:48:36,042][WARN ][jgroups.pbcast.NAKACK ] host1-7963: dropped
message from host2-58528 (not in xmit_table), keys are [host1-7963],
view=[host1-7963|0] [host1-7963]
I tried to change sh script to add :
java.net.preferIPv4Stack=false
java.net.preferIPv6Stack=false
but a new error appeared :
[WARN ][jgroups.TCP] no physical address ....
Thx, I updated my config file,
my stack was not complete I updated it : 371357’s gists · GitHub
I will continue to investigate, it is very strange, my networks seems good.
It seems like the host you provide in the configuration happens can't be
resolved... . Something with the network configuration?
As a side note, it makes little sense to configure a gateway for an index,
and not defining a gateway for the whole cluster. You should configure the
gateway in the following manner:
gateway:
type: fs
fs.location: /path
This will cause any index created on the node to use the fs gateway
automatically.
The "top level" gateway is important since it stores all the cluster meta
data, such as indices created, mappings, and so on.
cheers,
shay.banon
Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)
Yes, very strange. I really hope to get around this when I upgrade to
the
upcoming jgroups version (which is still in alpha stage, so I am
waiting).
-shay.banon
I did think that was bizarre, and it was only happening on one
machine
in
the cluster so i can't exactly replicate the issue. just thought i
would
document in case anyone else has the same problem.
I am not sure that the exception you posted on the first mail
relates
to
the updated configuration since the exception is from the netty
layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon
found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts: storage1.example.com[9700],storage2.example.com[9700]
This is a very strange behavior that I get with jgroups sometimes and still
have not managed to recreate it. I am working on a workaround for this.
Remind me, are you using udp or tcp with jgroups?
Too fast :
in fact it its working a while until I receiv :
host1 :
[13:33:20,054][WARN ][jgroups.FD ] I was suspected by
host2-28908; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:20,274][INFO ][cluster.service ] [Nighthawk] Master
{New [Nighthawk][host1-34919][data][inet[host1.domain1.com/ip1:9300]],
Previous [Anole][host2-28908][data][inet[sd-5175/ip2:9300]]}, Removed
{[Anole][host2-28908][data][inet[host2/ip2:9300]],}
Exception in thread "elasticsearch[Nighthawk][tp]-pool-1-thread-9"
java.lang.NullPointerException
at
org.elasticsearch.transport.SendRequestTransportException.(SendRequestTransportException.java:30)
at
org.elasticsearch.transport.TransportService$2.run(TransportService.java:152)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
[13:33:50,288][WARN ][jgroups.pbcast.NAKACK ] host1-34919: dropped
message from host2-28908 (not in xmit_table), keys are [host1-34919],
view=[host1-34919|2] [host1-34919]
host2:
[13:33:32,332][WARN ][jgroups.FD ] I was suspected by
host1-34919; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:32,335][WARN ][jgroups.pbcast.GMS ] host2-28908: not
member of view [host1-34919|2] [host1-34919]; discarding it
[13:33:32,601][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...
(received_address[org.elasticsearch.util.transport.DummyTransportAddress@e8d404
)
[13:33:32,674][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...
(received_address[org.elasticsearch.util.transport.DummyTransportAddress@f278dd
)
[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...
(received_address[org.elasticsearch.util.transport.DummyTransportAddress@dd3333
)
[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...
Everything is working fine (_cluster/nodes show 2nodes) until the
first error message then
_cluster/nodes/ show only 1 node
and nodes are not able to reconnects to the other one
I downloaded last version and installed it on my bad node.
new error :
[21:48:36,042][WARN ][jgroups.pbcast.NAKACK ] host1-7963: dropped
message from host2-58528 (not in xmit_table), keys are [host1-7963],
view=[host1-7963|0] [host1-7963]
I tried to change sh script to add :
java.net.preferIPv4Stack=false
java.net.preferIPv6Stack=false
but a new error appeared :
[WARN ][jgroups.TCP] no physical address ....
Thx, I updated my config file,
my stack was not complete I updated it : 371357’s gists · GitHub
I will continue to investigate, it is very strange, my networks seems
good.
It seems like the host you provide in the configuration happens can't
be
resolved... . Something with the network configuration?
As a side note, it makes little sense to configure a gateway for an
index,
and not defining a gateway for the whole cluster. You should configure
the
gateway in the following manner:
gateway:
type: fs
fs.location: /path
This will cause any index created on the node to use the fs gateway
automatically.
The "top level" gateway is important since it stores all the cluster
meta
data, such as indices created, mappings, and so on.
cheers,
shay.banon
Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)
Yes, very strange. I really hope to get around this when I
upgrade to
the
upcoming jgroups version (which is still in alpha stage, so I am
waiting).
-shay.banon
I did think that was bizarre, and it was only happening on one
machine
in
the cluster so i can't exactly replicate the issue. just
thought i
would
document in case anyone else has the same problem.
I am not sure that the exception you posted on the first mail
relates
to
the updated configuration since the exception is from the
netty
layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it
should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon
found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that
worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts: storage1.example.com[9700],storage2.example.com[9700]
On 30 March 2010 13:08, gareth stokes <
gareth@betechnology.com.au>
wrote:
Im having a lot of problems getting multiple nodes talking
to
each
other, for some reason netty keeps on giving me errors.
[01:57:20,724][WARN ][transport.netty ] [Alchemy]
Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)
now i'm sure this has to do with the way i've configured my
setup
but
for the life of me i can't see what im missing??
this is my config file
This is a very strange behavior that I get with jgroups sometimes and still
have not managed to recreate it. I am working on a workaround for this.
Remind me, are you using udp or tcp with jgroups?
cheers,
shay.banon
Too fast :
in fact it its working a while until I receiv :
host1 :
[13:33:20,054][WARN ][jgroups.FD ] I was suspected by
host2-28908; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:20,274][INFO ][cluster.service ] [Nighthawk] Master
{New [Nighthawk][host1-34919][data][inet[host1.domain1.com/ip1:9300]],
Previous [Anole][host2-28908][data][inet[sd-5175/ip2:9300]]}, Removed
{[Anole][host2-28908][data][inet[host2/ip2:9300]],}
Exception in thread "elasticsearch[Nighthawk][tp]-pool-1-thread-9"
java.lang.NullPointerException
at
org.elasticsearch.transport.SendRequestTransportException.(SendRequestTransportException.java:30)
at
org.elasticsearch.transport.TransportService$2.run(TransportService.java:152)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
[13:33:50,288][WARN ][jgroups.pbcast.NAKACK ] host1-34919: dropped
message from host2-28908 (not in xmit_table), keys are [host1-34919],
view=[host1-34919|2] [host1-34919]
host2:
[13:33:32,332][WARN ][jgroups.FD ] I was suspected by
host1-34919; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:32,335][WARN ][jgroups.pbcast.GMS ] host2-28908: not
member of view [host1-34919|2] [host1-34919]; discarding it
[13:33:32,601][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...
(received_address[org.elasticsearch.util.transport.DummyTransportAddress@e8d404)
[13:33:32,674][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...
(received_address[org.elasticsearch.util.transport.DummyTransportAddress@f278dd)
[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...
(received_address[org.elasticsearch.util.transport.DummyTransportAddress@dd3333)
[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...
Everything is working fine (_cluster/nodes show 2nodes) until the
first error message then
_cluster/nodes/ show only 1 node
and nodes are not able to reconnects to the other one
I downloaded last version and installed it on my bad node.
new error :
[21:48:36,042][WARN ][jgroups.pbcast.NAKACK ] host1-7963: dropped
message from host2-58528 (not in xmit_table), keys are [host1-7963],
view=[host1-7963|0] [host1-7963]
I tried to change sh script to add :
java.net.preferIPv4Stack=false
java.net.preferIPv6Stack=false
but a new error appeared :
[WARN ][jgroups.TCP] no physical address ....
Thx, I updated my config file,
my stack was not complete I updated it : 371357’s gists · GitHub
I will continue to investigate, it is very strange, my networks seems
good.
It seems like the host you provide in the configuration happens can't
be
resolved... . Something with the network configuration?
As a side note, it makes little sense to configure a gateway for an
index,
and not defining a gateway for the whole cluster. You should
configure the
gateway in the following manner:
gateway:
type: fs
fs.location: /path
This will cause any index created on the node to use the fs gateway
automatically.
The "top level" gateway is important since it stores all the cluster
meta
data, such as indices created, mappings, and so on.
cheers,
shay.banon
Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)
Yes, very strange. I really hope to get around this when I
upgrade to
the
upcoming jgroups version (which is still in alpha stage, so I
am
waiting).
-shay.banon
I did think that was bizarre, and it was only happening on one
machine
in
the cluster so i can't exactly replicate the issue. just
thought i
would
document in case anyone else has the same problem.
I am not sure that the exception you posted on the first mail
relates
to
the updated configuration since the exception is from the
netty
layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it
should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon
found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that
worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts: storage1.example.com[9700],storage2.example.com[9700]
Ok Finally : after node discovery crash, I stopped my first node
creating gateway fs files during shutdown.
Then I restarted It.
And it seems working my 2 nodes are still connected after 1 day.
This is a very strange behavior that I get with jgroups sometimes and still
have not managed to recreate it. I am working on a workaround for this.
Remind me, are you using udp or tcp with jgroups?
cheers,
shay.banon
Too fast :
in fact it its working a while until I receiv :
host1 :
[13:33:20,054][WARN ][jgroups.FD ] I was suspected by
host2-28908; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:20,274][INFO ][cluster.service ] [Nighthawk] Master
{New [Nighthawk][host1-34919][data][inet[host1.domain1.com/ip1:9300]],
Previous [Anole][host2-28908][data][inet[sd-5175/ip2:9300]]}, Removed
{[Anole][host2-28908][data][inet[host2/ip2:9300]],}
Exception in thread "elasticsearch[Nighthawk][tp]-pool-1-thread-9"
java.lang.NullPointerException
at
org.elasticsearch.transport.SendRequestTransportException.(SendRequestTransportException.java:30)
at
org.elasticsearch.transport.TransportService$2.run(TransportService.java:152)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
[13:33:50,288][WARN ][jgroups.pbcast.NAKACK ] host1-34919: dropped
message from host2-28908 (not in xmit_table), keys are [host1-34919],
view=[host1-34919|2] [host1-34919]
host2:
[13:33:32,332][WARN ][jgroups.FD ] I was suspected by
host1-34919; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:32,335][WARN ][jgroups.pbcast.GMS ] host2-28908: not
member of view [host1-34919|2] [host1-34919]; discarding it
[13:33:32,601][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...
(received_address[org.elasticsearch.util.transport.DummyTransportAddress@e8d404)
[13:33:32,674][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...
(received_address[org.elasticsearch.util.transport.DummyTransportAddress@f278dd)
[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...
(received_address[org.elasticsearch.util.transport.DummyTransportAddress@dd3333)
[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...
Everything is working fine (_cluster/nodes show 2nodes) until the
first error message then
_cluster/nodes/ show only 1 node
and nodes are not able to reconnects to the other one
I downloaded last version and installed it on my bad node.
new error :
[21:48:36,042][WARN ][jgroups.pbcast.NAKACK ] host1-7963: dropped
message from host2-58528 (not in xmit_table), keys are [host1-7963],
view=[host1-7963|0] [host1-7963]
I tried to change sh script to add :
java.net.preferIPv4Stack=false
java.net.preferIPv6Stack=false
but a new error appeared :
[WARN ][jgroups.TCP] no physical address ....
Thx, I updated my config file,
my stack was not complete I updated it : 371357’s gists · GitHub
I will continue to investigate, it is very strange, my networks seems
good.
It seems like the host you provide in the configuration happens can't
be
resolved... . Something with the network configuration?
As a side note, it makes little sense to configure a gateway for an
index,
and not defining a gateway for the whole cluster. You should
configure the
gateway in the following manner:
gateway:
type: fs
fs.location: /path
This will cause any index created on the node to use the fs gateway
automatically.
The "top level" gateway is important since it stores all the cluster
meta
data, such as indices created, mappings, and so on.
cheers,
shay.banon
Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)
Yes, very strange. I really hope to get around this when I
upgrade to
the
upcoming jgroups version (which is still in alpha stage, so I
am
waiting).
-shay.banon
I did think that was bizarre, and it was only happening on one
machine
in
the cluster so i can't exactly replicate the issue. just
thought i
would
document in case anyone else has the same problem.
I am not sure that the exception you posted on the first mail
relates
to
the updated configuration since the exception is from the
netty
layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it
should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon
found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that
worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts: storage1.example.com[9700],storage2.example.com[9700]
Great that things are finally working for you!. I am working on a new
discovery module that will replace the jgroups one which I hope will
eliminate the problems you were facing...
Ok Finally : after node discovery crash, I stopped my first node
creating gateway fs files during shutdown.
Then I restarted It.
And it seems working my 2 nodes are still connected after 1 day.
This is a very strange behavior that I get with jgroups sometimes and
still
have not managed to recreate it. I am working on a workaround for this.
Remind me, are you using udp or tcp with jgroups?
cheers,
shay.banon
Too fast :
in fact it its working a while until I receiv :
host1 :
[13:33:20,054][WARN ][jgroups.FD ] I was suspected by
host2-28908; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:20,274][INFO ][cluster.service ] [Nighthawk] Master
{New [Nighthawk][host1-34919][data][inet[host1.domain1.com/ip1:9300]],
Previous [Anole][host2-28908][data][inet[sd-5175/ip2:9300]]}, Removed
{[Anole][host2-28908][data][inet[host2/ip2:9300]],}
Exception in thread "elasticsearch[Nighthawk][tp]-pool-1-thread-9"
java.lang.NullPointerException
at
[13:33:50,288][WARN ][jgroups.pbcast.NAKACK ] host1-34919: dropped
message from host2-28908 (not in xmit_table), keys are [host1-34919],
view=[host1-34919|2] [host1-34919]
host2:
[13:33:32,332][WARN ][jgroups.FD ] I was suspected by
host1-34919; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:32,335][WARN ][jgroups.pbcast.GMS ] host2-28908: not
member of view [host1-34919|2] [host1-34919]; discarding it
[13:33:32,601][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...
Everything is working fine (_cluster/nodes show 2nodes) until the
first error message then
_cluster/nodes/ show only 1 node
and nodes are not able to reconnects to the other one
I downloaded last version and installed it on my bad node.
new error :
[21:48:36,042][WARN ][jgroups.pbcast.NAKACK ] host1-7963: dropped
message from host2-58528 (not in xmit_table), keys are [host1-7963],
view=[host1-7963|0] [host1-7963]
I tried to change sh script to add :
java.net.preferIPv4Stack=false
java.net.preferIPv6Stack=false
but a new error appeared :
[WARN ][jgroups.TCP] no physical address ....
Thx, I updated my config file,
my stack was not complete I updated it : 371357’s gists · GitHub
I will continue to investigate, it is very strange, my networks
seems
good.
It seems like the host you provide in the configuration happens
can't
be
resolved... . Something with the network configuration?
As a side note, it makes little sense to configure a gateway for
an
index,
and not defining a gateway for the whole cluster. You should
configure the
gateway in the following manner:
gateway:
type: fs
fs.location: /path
This will cause any index created on the node to use the fs
gateway
automatically.
The "top level" gateway is important since it stores all the
cluster
meta
data, such as indices created, mappings, and so on.
cheers,
shay.banon
Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)
Yes, very strange. I really hope to get around this when I
upgrade to
the
upcoming jgroups version (which is still in alpha stage, so
I
am
waiting).
-shay.banon
I did think that was bizarre, and it was only happening on
one
machine
in
the cluster so i can't exactly replicate the issue. just
thought i
would
document in case anyone else has the same problem.
I am not sure that the exception you posted on the first
mail
relates
to
the updated configuration since the exception is from the
netty
layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it
should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banon
found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that
worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts: storage1.example.com[9700],storage2.example.com[9700]
Great that things are finally working for you!. I am working on a new
discovery module that will replace the jgroups one which I hope will
eliminate the problems you were facing...
Sounds great: what are you using for the new discovery module? Or is
it completely written from scratch?
Completely from scratch, utilizing the built in components in elasticsearch
(like the transport). Also, trying to build one that has pluggable support
for the "cloud" (more on that later...).
Great that things are finally working for you!. I am working on a new
discovery module that will replace the jgroups one which I hope will
eliminate the problems you were facing...
Sounds great: what are you using for the new discovery module? Or is
it completely written from scratch?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.