I am wondering what the default behavior (see title) in my cluster is
supposed to be. I am collecting tweets in a cluster called Twitter. Right
now there are 2 nodes, server1 and server2. If one of these nodes fails, I
would like for the other one to pick up where it left off. Is that the
intended behavior or is there some other mechanism I am missing? I should
note that the index I created has 6 shards and 2 replicas.
I guess what I am looking for is the best way to prevent failure on my
cluster. I am thinking that I can set up 2 masters and then some data nodes
but I really need to ensure that if data has stopped being collected by one
node, the other one will pick it up and run with it. I noticed yesterday
that both master nodes can't have a river on them. Or maybe its that both
masters can't have a river connected to an index with the same name. Anyone
have thoughts on this?
A
On Monday, July 30, 2012 5:33:45 PM UTC-4, Adam Estrada wrote:
I am wondering what the default behavior (see title) in my cluster is
supposed to be. I am collecting tweets in a cluster called Twitter. Right
now there are 2 nodes, server1 and server2. If one of these nodes fails, I
would like for the other one to pick up where it left off. Is that the
intended behavior or is there some other mechanism I am missing? I should
note that the index I created has 6 shards and 2 replicas.
Rivers are run as a single instance per cluster. That is the main
benefit of utilizing a river: the indexing is done at the
cluster-level, so it can continue even with partial node failures.
That said, I have never tested how well a river responds should the
node it is running on goes down.
I guess what I am looking for is the best way to prevent failure on my
cluster. I am thinking that I can set up 2 masters and then some data nodes
but I really need to ensure that if data has stopped being collected by one
node, the other one will pick it up and run with it. I noticed yesterday
that both master nodes can't have a river on them. Or maybe its that both
masters can't have a river connected to an index with the same name. Anyone
have thoughts on this?
A
On Monday, July 30, 2012 5:33:45 PM UTC-4, Adam Estrada wrote:
I am wondering what the default behavior (see title) in my cluster is
supposed to be. I am collecting tweets in a cluster called Twitter. Right
now there are 2 nodes, server1 and server2. If one of these nodes fails, I
would like for the other one to pick up where it left off. Is that the
intended behavior or is there some other mechanism I am missing? I should
note that the index I created has 6 shards and 2 replicas.
Thanks for the feedback. It looks like when the twitter river stops working
for any reason, the other instance(s) will not pick it up either. This is
what you mentioned. So, how do you recommend making sure the rivers never
stop running?
Adam
On Tue, Jul 31, 2012 at 4:45 PM, Ivan Brusic ivan@brusic.com wrote:
Rivers are run as a single instance per cluster. That is the main
benefit of utilizing a river: the indexing is done at the
cluster-level, so it can continue even with partial node failures.
That said, I have never tested how well a river responds should the
node it is running on goes down.
I guess what I am looking for is the best way to prevent failure on my
cluster. I am thinking that I can set up 2 masters and then some data
nodes
but I really need to ensure that if data has stopped being collected by
one
node, the other one will pick it up and run with it. I noticed yesterday
that both master nodes can't have a river on them. Or maybe its that both
masters can't have a river connected to an index with the same name.
Anyone
have thoughts on this?
A
On Monday, July 30, 2012 5:33:45 PM UTC-4, Adam Estrada wrote:
I am wondering what the default behavior (see title) in my cluster is
supposed to be. I am collecting tweets in a cluster called Twitter.
Right
now there are 2 nodes, server1 and server2. If one of these nodes
fails, I
would like for the other one to pick up where it left off. Is that the
intended behavior or is there some other mechanism I am missing? I
should
note that the index I created has 6 shards and 2 replicas.
I already saw that issue some months ago but with Shay, we did not find a way to solve it.
It's related to the twitter river itself, not to rivers in general.
The twitter river failed but does not restart itself.
Thanks for the feedback. It looks like when the twitter river stops working for any reason, the other instance(s) will not pick it up either. This is what you mentioned. So, how do you recommend making sure the rivers never stop running?
Adam
On Tue, Jul 31, 2012 at 4:45 PM, Ivan Brusic ivan@brusic.com wrote:
Rivers are run as a single instance per cluster. That is the main
benefit of utilizing a river: the indexing is done at the
cluster-level, so it can continue even with partial node failures.
That said, I have never tested how well a river responds should the
node it is running on goes down.
I guess what I am looking for is the best way to prevent failure on my
cluster. I am thinking that I can set up 2 masters and then some data nodes
but I really need to ensure that if data has stopped being collected by one
node, the other one will pick it up and run with it. I noticed yesterday
that both master nodes can't have a river on them. Or maybe its that both
masters can't have a river connected to an index with the same name. Anyone
have thoughts on this?
A
On Monday, July 30, 2012 5:33:45 PM UTC-4, Adam Estrada wrote:
I am wondering what the default behavior (see title) in my cluster is
supposed to be. I am collecting tweets in a cluster called Twitter. Right
now there are 2 nodes, server1 and server2. If one of these nodes fails, I
would like for the other one to pick up where it left off. Is that the
intended behavior or is there some other mechanism I am missing? I should
note that the index I created has 6 shards and 2 replicas.
I already saw that issue some months ago but with Shay, we did not find a way to solve it.
It's related to the twitter river itself, not to rivers in general.
The twitter river failed but does not restart itself.
Thanks for the feedback. It looks like when the twitter river stops working for any reason, the other instance(s) will not pick it up either. This is what you mentioned. So, how do you recommend making sure the rivers never stop running?
Adam
On Tue, Jul 31, 2012 at 4:45 PM, Ivan Brusic ivan@brusic.com wrote:
Rivers are run as a single instance per cluster. That is the main
benefit of utilizing a river: the indexing is done at the
cluster-level, so it can continue even with partial node failures.
That said, I have never tested how well a river responds should the
node it is running on goes down.
I guess what I am looking for is the best way to prevent failure on my
cluster. I am thinking that I can set up 2 masters and then some data nodes
but I really need to ensure that if data has stopped being collected by one
node, the other one will pick it up and run with it. I noticed yesterday
that both master nodes can't have a river on them. Or maybe its that both
masters can't have a river connected to an index with the same name. Anyone
have thoughts on this?
A
On Monday, July 30, 2012 5:33:45 PM UTC-4, Adam Estrada wrote:
I am wondering what the default behavior (see title) in my cluster is
supposed to be. I am collecting tweets in a cluster called Twitter. Right
now there are 2 nodes, server1 and server2. If one of these nodes fails, I
would like for the other one to pick up where it left off. Is that the
intended behavior or is there some other mechanism I am missing? I should
note that the index I created has 6 shards and 2 replicas.
What I did was to create a cron (Shell script) that look into logs and restart the node at each error. As I had 2 nodes, twitter river restarted on node 2.
That was a little workaround but not the best way to solve it!
I already saw that issue some months ago but with Shay, we did not find a way to solve it.
It's related to the twitter river itself, not to rivers in general.
The twitter river failed but does not restart itself.
Thanks for the feedback. It looks like when the twitter river stops working for any reason, the other instance(s) will not pick it up either. This is what you mentioned. So, how do you recommend making sure the rivers never stop running?
Adam
On Tue, Jul 31, 2012 at 4:45 PM, Ivan Brusic ivan@brusic.com wrote:
Rivers are run as a single instance per cluster. That is the main
benefit of utilizing a river: the indexing is done at the
cluster-level, so it can continue even with partial node failures.
That said, I have never tested how well a river responds should the
node it is running on goes down.
I guess what I am looking for is the best way to prevent failure on my
cluster. I am thinking that I can set up 2 masters and then some data nodes
but I really need to ensure that if data has stopped being collected by one
node, the other one will pick it up and run with it. I noticed yesterday
that both master nodes can't have a river on them. Or maybe its that both
masters can't have a river connected to an index with the same name. Anyone
have thoughts on this?
A
On Monday, July 30, 2012 5:33:45 PM UTC-4, Adam Estrada wrote:
I am wondering what the default behavior (see title) in my cluster is
supposed to be. I am collecting tweets in a cluster called Twitter. Right
now there are 2 nodes, server1 and server2. If one of these nodes fails, I
would like for the other one to pick up where it left off. Is that the
intended behavior or is there some other mechanism I am missing? I should
note that the index I created has 6 shards and 2 replicas.
We are investigating how to fix the problem in the river code. We'll share
the fixes as they come in I would still be interested in seeing your
code too though.
Adam
On Tuesday, July 31, 2012 8:13:23 PM UTC-4, David Pilato wrote:
What I did was to create a cron (Shell script) that look into logs and
restart the node at each error. As I had 2 nodes, twitter river restarted
on node 2.
That was a little workaround but not the best way to solve it!
I already saw that issue some months ago but with Shay, we did not find
a way to solve it.
It's related to the twitter river itself, not to rivers in general.
The twitter river failed but does not restart itself.
Thanks for the feedback. It looks like when the twitter river stops
working for any reason, the other instance(s) will not pick it up either.
This is what you mentioned. So, how do you recommend making sure the rivers
never stop running?
Adam
On Tue, Jul 31, 2012 at 4:45 PM, Ivan Brusic ivan@brusic.com wrote:
Rivers are run as a single instance per cluster. That is the main
benefit of utilizing a river: the indexing is done at the
cluster-level, so it can continue even with partial node failures.
That said, I have never tested how well a river responds should the
node it is running on goes down.
I guess what I am looking for is the best way to prevent failure on
my
cluster. I am thinking that I can set up 2 masters and then some data
nodes
but I really need to ensure that if data has stopped being collected
by one
node, the other one will pick it up and run with it. I noticed
yesterday
that both master nodes can't have a river on them. Or maybe its that
both
masters can't have a river connected to an index with the same name.
Anyone
have thoughts on this?
A
On Monday, July 30, 2012 5:33:45 PM UTC-4, Adam Estrada wrote:
I am wondering what the default behavior (see title) in my cluster
is
supposed to be. I am collecting tweets in a cluster called Twitter.
Right
now there are 2 nodes, server1 and server2. If one of these nodes
fails, I
would like for the other one to pick up where it left off. Is that
the
intended behavior or is there some other mechanism I am missing? I
should
note that the index I created has 6 shards and 2 replicas.
We are investigating how to fix the problem in the river code. We'll share the
fixes as they come in I would still be interested in seeing your code too
though.
Adam
On Tuesday, July 31, 2012 8:13:23 PM UTC-4, David Pilato wrote:
What I did was to create a cron (Shell script) that look into logs and
restart the node at each error. As I had 2 nodes, twitter river restarted on
node 2.
That was a little workaround but not the best way to solve it!
I already saw that issue some months ago but with Shay, we did not
find a way to solve it.
It's related to the twitter river itself, not to rivers in general.
The twitter river failed but does not restart itself.
Thanks for the feedback. It looks like when the twitter river stops
working for any reason, the other instance(s) will not pick it up
either. This is what you mentioned. So, how do you recommend making
sure the rivers never stop running?
Adam
On Tue, Jul 31, 2012 at 4:45 PM, Ivan Brusic < ivan@brusic.com mailto:ivan@brusic.com > wrote:
Rivers are run as a single instance per cluster. That is the main
benefit of utilizing a river: the indexing is done at the
cluster-level, so it can continue even with partial node failures.
That said, I have never tested how well a river responds should the
node it is running on goes down.
I guess what I am looking for is the best way to prevent failure on
my
cluster. I am thinking that I can set up 2 masters and then some
data nodes
but I really need to ensure that if data has stopped being collected
by one
node, the other one will pick it up and run with it. I noticed
yesterday
that both master nodes can't have a river on them. Or maybe its that
both
masters can't have a river connected to an index with the same name.
Anyone
have thoughts on this?
A
On Monday, July 30, 2012 5:33:45 PM UTC-4, Adam Estrada wrote:
I am wondering what the default behavior (see title) in my cluster
is
supposed to be. I am collecting tweets in a cluster called Twitter.
Right
now there are 2 nodes, server1 and server2. If one of these nodes
fails, I
would like for the other one to pick up where it left off. Is that
the
intended behavior or is there some other mechanism I am missing? I
should
note that the index I created has 6 shards and 2 replicas.
We are investigating how to fix the problem in the river code. We'll
share the fixes as they come in I would still be interested in seeing
your code too though.
Adam
On Tuesday, July 31, 2012 8:13:23 PM UTC-4, David Pilato wrote:
What I did was to create a cron (Shell script) that look into logs and
restart the node at each error. As I had 2 nodes, twitter river restarted
on node 2.
That was a little workaround but not the best way to solve it!
I already saw that issue some months ago but with Shay, we did not find
a way to solve it.
It's related to the twitter river itself, not to rivers in general.
The twitter river failed but does not restart itself.
Thanks for the feedback. It looks like when the twitter river stops
working for any reason, the other instance(s) will not pick it up either.
This is what you mentioned. So, how do you recommend making sure the rivers
never stop running?
Adam
On Tue, Jul 31, 2012 at 4:45 PM, Ivan Brusic < ivan@brusic.com>
wrote:
Rivers are run as a single instance per cluster. That is the main
benefit of utilizing a river: the indexing is done at the
cluster-level, so it can continue even with partial node failures.
That said, I have never tested how well a river responds should the
node it is running on goes down.
I guess what I am looking for is the best way to prevent failure on
my
cluster. I am thinking that I can set up 2 masters and then some data
nodes
but I really need to ensure that if data has stopped being collected
by one
node, the other one will pick it up and run with it. I noticed
yesterday
that both master nodes can't have a river on them. Or maybe its that
both
masters can't have a river connected to an index with the same name.
Anyone
have thoughts on this?
A
On Monday, July 30, 2012 5:33:45 PM UTC-4, Adam Estrada wrote:
I am wondering what the default behavior (see title) in my cluster
is
supposed to be. I am collecting tweets in a cluster called Twitter.
Right
now there are 2 nodes, server1 and server2. If one of these nodes
fails, I
would like for the other one to pick up where it left off. Is that
the
intended behavior or is there some other mechanism I am missing? I
should
note that the index I created has 6 shards and 2 replicas.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.