Hi,
Great that things are finally working for you!. I am working on a new
discovery module that will replace the jgroups one which I hope will
eliminate the problems you were facing...
cheers,
shay.banon
On Thu, Apr 22, 2010 at 12:19 PM, alexandre gerlic <
alexandre.gerlic@gmail.com> wrote:
Ok Finally : after node discovery crash, I stopped my first node
creating gateway fs files during shutdown.
Then I restarted It.
And it seems working my 2 nodes are still connected after 1 day.2010/4/21 alexandre gerlic alexandre.gerlic@gmail.com:
I am using tcp and binary release : 0.6.0
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]thx for your help
2010/4/21 Shay Banon shay.banon@elasticsearch.com:
This is a very strange behavior that I get with jgroups sometimes and
still
have not managed to recreate it. I am working on a workaround for this.
Remind me, are you using udp or tcp with jgroups?
cheers,
shay.banonOn Wed, Apr 21, 2010 at 4:13 PM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:Too fast :
in fact it its working a while until I receiv :host1 :
[13:33:20,054][WARN ][jgroups.FD ] I was suspected by
host2-28908; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:20,274][INFO ][cluster.service ] [Nighthawk] Master
{New [Nighthawk][host1-34919][data][inet[host1.domain1.com/ip1:9300]],
Previous [Anole][host2-28908][data][inet[sd-5175/ip2:9300]]}, Removed
{[Anole][host2-28908][data][inet[host2/ip2:9300]],}
Exception in thread "elasticsearch[Nighthawk][tp]-pool-1-thread-9"
java.lang.NullPointerException
atorg.elasticsearch.transport.SendRequestTransportException.(SendRequestTransportException.java:30)
at
org.elasticsearch.transport.TransportService$2.run(TransportService.java:152)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:619)
[13:33:50,288][WARN ][jgroups.pbcast.NAKACK ] host1-34919: dropped
message from host2-28908 (not in xmit_table), keys are [host1-34919],
view=[host1-34919|2] [host1-34919]host2:
[13:33:32,332][WARN ][jgroups.FD ] I was suspected by
host1-34919; ignoring the SUSPECT message and sending back a
HEARTBEAT_ACK
[13:33:32,335][WARN ][jgroups.pbcast.GMS ] host2-28908: not
member of view [host1-34919|2] [host1-34919]; discarding it
[13:33:32,601][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...(received_address[org.elasticsearch.util.transport.DummyTransportAddress@e8d404
)[13:33:32,674][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...(received_address[org.elasticsearch.util.transport.DummyTransportAddress@f278dd
)[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...(received_address[org.elasticsearch.util.transport.DummyTransportAddress@dd3333
)[13:33:32,675][WARN ][discovery.jgroups ] [Anole] Received a
wrong address type from [host1-34919], ignoring...(received_address[org.elasticsearch.util.transport.DummyTransportAddress@4c963c
)Everything is working fine (_cluster/nodes show 2nodes) until the
first error message then
_cluster/nodes/ show only 1 node
and nodes are not able to reconnects to the other one2010/4/20 alexandre gerlic alexandre.gerlic@gmail.com:
ok finally I modified by hosts to add my node and it is now working.
2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:
some more infos:
it seems to be the same issue than :http://github.com/elasticsearch/elasticsearch/issues/labels/bug#issue/40
I downloaded last version and installed it on my bad node.
new error :
[21:48:36,042][WARN ][jgroups.pbcast.NAKACK ] host1-7963: dropped
message from host2-58528 (not in xmit_table), keys are [host1-7963],
view=[host1-7963|0] [host1-7963]I tried to change sh script to add :
java.net.preferIPv4Stack=false
java.net.preferIPv6Stack=falsebut a new error appeared :
[WARN ][jgroups.TCP] no physical address ....it seems to be a jgroup issue.
2010/4/19 alexandre gerlic alexandre.gerlic@gmail.com:
Thx, I updated my config file,
my stack was not complete I updated it :
371357’s gists · GitHub
I will continue to investigate, it is very strange, my networks
seems
good.2010/4/19 Shay Banon shay.banon@elasticsearch.com:
It seems like the host you provide in the configuration happens
can't
be
resolved... . Something with the network configuration?
As a side note, it makes little sense to configure a gateway for
an
index,
and not defining a gateway for the whole cluster. You should
configure the
gateway in the following manner:
gateway:
type: fs
fs.location: /path
This will cause any index created on the node to use the fs
gateway
automatically.
The "top level" gateway is important since it stores all the
cluster
meta
data, such as indices created, mappings, and so on.
cheers,
shay.banonOn Mon, Apr 19, 2010 at 8:58 PM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:Hi, thx more informations below
stack : 371357’s gists · GitHub
ip1 config file :
cluster:
name: clustername
network:
bindHost: ip1
publishHost: ip1
index.engine.robin.refreshInterval: -1
index.gateway.snapshot_interval: -1
index.gateway.fs.location: /path
index.gateway.type: fs
index.number_of_shards : 5
index.number_of_replicas : 1index :
store:
fs:
memory:
enabled: true
discovery:
jgroups:
config: tcp
bind_port: 9700
tcpping:
initial_hosts: ip1[9700], ip2[9700]My first node is working on ubuntu 8.04 without problem
seconde one on ubuntu 9.10 throw this exception,when I call :
http://ip1:9200/_cluster/nodes : only 1 node
http://ip2:9200/_cluster/nodes : 2 nodesI checked firewall and seems ok.
2010/4/19 Shay Banon shay.banon@elasticsearch.com:
Can you post the full stack trace? Are you running in an
embedded
mode?
cheers,
shay.banonOn Mon, Apr 19, 2010 at 2:45 AM, alexandre gerlic
alexandre.gerlic@gmail.com wrote:Hi,
I have the same issue on one of my node:
Exception caught on netty layer
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)I tried your fix without success.
Do you have any update on this issue ?
Thx
2010/3/31 Shay Banon shay.banon@elasticsearch.com:
Yes, very strange. I really hope to get around this when I
upgrade to
the
upcoming jgroups version (which is still in alpha stage, so
I
am
waiting).
-shay.banonOn Wed, Mar 31, 2010 at 2:54 AM, Gareth Stokes
gareth@betechnology.com.au
wrote:I did think that was bizarre, and it was only happening on
one
machine
in
the cluster so i can't exactly replicate the issue. just
thought i
would
document in case anyone else has the same problem.On 30 March 2010 18:09, Shay Banon
shay.banon@elasticsearch.com
wrote:I am not sure that the exception you posted on the first
relates
to
the updated configuration since the exception is from the
netty
layer
(the
transport) and jgroups fix is for the discovery layer.
In general, you don't have to set the bind_addr, since it
should
default
to the network.bindHost (assuming both use ipv4/ipv6).
-shay.banonOn Tue, Mar 30, 2010 at 8:15 AM, Gareth Stokes
gareth@betechnology.com.au wrote:found the problem, ended up being that i didn't have
discovery.jgroups.bind_addr set, here is the config that
worked in
case
anyone else has the same problem:
network:
bindHost: storage1.example.com
publishHost: storage1.example.com
transport:
netty:
port: 9400
http:
netty:
enabled: true
port: 9401
cluster:
name: ExampleIndexer
discovery:
jgroups:
config: tcp
bind_port: 9700
bind_addr: storage1.example.com
tcpping:
initial_hosts:
storage1.example.com[9700],storage2.example.com[9700]On 30 March 2010 13:08, gareth stokes
gareth@betechnology.com.au
wrote:Im having a lot of problems getting multiple nodes
talking
to
each
other, for some reason netty keeps on giving me errors.[01:57:20,724][WARN ][transport.netty ]
[Alchemy]
Exception
caught on netty layer [[id: 0x11fb24d3]]
java.nio.channels.UnresolvedAddressException
at sun.nio.ch.Net.checkAddress(Net.java:30)now i'm sure this has to do with the way i've configured
my
setup
but
for the life of me i can't see what im missing??
this is my config filenetwork :
bindHost : storage1.example.com
publishHost : storage1.example.com
transport :
netty :
port : 9300
cluster :
name : StorageIndexer
discovery :
jgroups :
config : tcp
bind_port : 9400
tcpping :
initial_hosts :
storage1.example.com[9400],storage2.example.com[9400]--
Alexandre Gerlic--
Alexandre Gerlic--
Alexandre Gerlic--
Alexandre Gerlic--
Alexandre Gerlic--
Alexandre Gerlic--
Alexandre Gerlic--
Alexandre Gerlic