Load Balancing when Node got Down

Hi to All.,

First, I will Share my ElasticSearch Servers Configurations,
we have, Totally 3 nodes[1master node,data=false, 3 slave nodes, data=true]
and all are running under same cluster, but two nodes are running on one
server[xx.xx.xx.88] and the remaining two are running on other
server[xx.xx.xx.89].
each node has 5gb of heap memory.

Second, what we are doing with these nodes is,
A. we are loading documents into these nodes using routing mechanism.
i.e., we have one algorithm which will use msisdn number to generate
the routing value, and we use this routing value to insert the same record
into elasticsearch.
default value for shards and replica settings is: 8 for shards and 0
for replicas
So, eventually, All indices have 8 primary shards and 0 replicas.
when we insert any document, it will sit in a shard which has the same
value of the routing value.
This means for a fixed routing value, the document will send to a
fixed shard[this shard has the same value which is same as routing value]
every time.
And also this shard is fixed with one of the our Data Node.
So this indexing operation is tightly coupled with the record with the node.

Unfortunately, one of our data node got failure during performing a query
operation[here, bulk loading operation is running continuously in
background].
the node that got failure is unresponsible and the other data node and
master node is responsible. Due to this,
some of the records are not inserted into elasticsearch[the records are
tightly coupled with nods, explained above in point A].

Now, what we need is,

  1. how to do load balance while the node got failure.
    [i.e., the shards of the node, that got down, should be available for index/query operation...]
  2. please share the details about the load balance, how it works and
  3. is it solve my problem or not?.

also tell me is there any other mechanism to fix this problem. please share
if you know.
We appreciate your reply..

Thanks and Regards
Rafi

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

  1. If you want higher availability (the ability to survive nodes going
    down) you will need replicas. With your current scheme of 8 shards and
    zero replicas, there is no way to survive a node outage because all the
    data lives in a single shard, on a single node.

  2. Elasticsearch automatically performs routing and round-robin
    balancing at the cluster level. You can send a request to any node and it
    will forward the request to the correct node. If you have multiple copies
    of the data (replicas), elasticsearch will automatically round-robin
    between these copies of data to distribute the load amongst your cluster

I would also like to add that you don't need four elasticsearch instances
on only two nodes. Elasticsearch will happily use all the resources of a
single machine - you do not gain anything by running additional instances
on a single box. It can in fact hurt performance, and makes availability
difficult (since a primary and replica can theoretically live on the same
physical machine, eliminating the benefit of replicas)

-Zach

On Thursday, August 29, 2013 8:10:33 AM UTC-4, rafi wrote:

Hi to All.,

First, I will Share my Elasticsearch Servers Configurations,
we have, Totally 3 nodes[1master node,data=false, 3 slave nodes,
data=true] and all are running under same cluster, but two nodes are
running on one server[xx.xx.xx.88] and the remaining two are running on
other server[xx.xx.xx.89].
each node has 5gb of heap memory.

Second, what we are doing with these nodes is,
A. we are loading documents into these nodes using routing mechanism.
i.e., we have one algorithm which will use msisdn number to generate
the routing value, and we use this routing value to insert the same record
into elasticsearch.
default value for shards and replica settings is: 8 for shards and 0
for replicas
So, eventually, All indices have 8 primary shards and 0 replicas.
when we insert any document, it will sit in a shard which has the
same value of the routing value.
This means for a fixed routing value, the document will send to a
fixed shard[this shard has the same value which is same as routing value]
every time.
And also this shard is fixed with one of the our Data Node.
So this indexing operation is tightly coupled with the record with the
node.

Unfortunately, one of our data node got failure during performing a query
operation[here, bulk loading operation is running continuously in
background].
the node that got failure is unresponsible and the other data node and
master node is responsible. Due to this,
some of the records are not inserted into elasticsearch[the records are
tightly coupled with nods, explained above in point A].

Now, what we need is,

  1. how to do load balance while the node got failure.
    [i.e., the shards of the node, that got down, should be available for index/query operation...]
  2. please share the details about the load balance, how it works and
  3. is it solve my problem or not?.

also tell me is there any other mechanism to fix this problem. please
share if you know.
We appreciate your reply..

Thanks and Regards
Rafi

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I believe that if you set it up for 1 replica, then it will keep one copy
of the data on a separate node and it will survive the failure of any one
node.

Ralph

On Thursday, August 29, 2013 6:10:33 AM UTC-6, rafi wrote:

Hi to All.,

First, I will Share my Elasticsearch Servers Configurations,
we have, Totally 3 nodes[1master node,data=false, 3 slave nodes,
data=true] and all are running under same cluster, but two nodes are
running on one server[xx.xx.xx.88] and the remaining two are running on
other server[xx.xx.xx.89].
each node has 5gb of heap memory.

Second, what we are doing with these nodes is,
A. we are loading documents into these nodes using routing mechanism.
i.e., we have one algorithm which will use msisdn number to generate
the routing value, and we use this routing value to insert the same record
into elasticsearch.
default value for shards and replica settings is: 8 for shards and 0
for replicas
So, eventually, All indices have 8 primary shards and 0 replicas.
when we insert any document, it will sit in a shard which has the
same value of the routing value.
This means for a fixed routing value, the document will send to a
fixed shard[this shard has the same value which is same as routing value]
every time.
And also this shard is fixed with one of the our Data Node.
So this indexing operation is tightly coupled with the record with the
node.

Unfortunately, one of our data node got failure during performing a query
operation[here, bulk loading operation is running continuously in
background].
the node that got failure is unresponsible and the other data node and
master node is responsible. Due to this,
some of the records are not inserted into elasticsearch[the records are
tightly coupled with nods, explained above in point A].

Now, what we need is,

  1. how to do load balance while the node got failure.
    [i.e., the shards of the node, that got down, should be available for index/query operation...]
  2. please share the details about the load balance, how it works and
  3. is it solve my problem or not?.

also tell me is there any other mechanism to fix this problem. please
share if you know.
We appreciate your reply..

Thanks and Regards
Rafi

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

That's correct. BUT, on the described setup multiple nodes on a same
server, you have to tell ES which nodes to consider in the 'failure' zone
and keep the replicas apart - c.r.a.awareness.attribute. There's an example
in the docs using racks, if you run in EC2 you would use the AZ.

Zachary's advice is great.
On 30/08/2013 12:37 AM, "Ralph Trickey" ralphtrickey@gmail.com wrote:

I believe that if you set it up for 1 replica, then it will keep one copy
of the data on a separate node and it will survive the failure of any one
node.

Ralph

On Thursday, August 29, 2013 6:10:33 AM UTC-6, rafi wrote:

Hi to All.,

First, I will Share my Elasticsearch Servers Configurations,
we have, Totally 3 nodes[1master node,data=false, 3 slave nodes,
data=true] and all are running under same cluster, but two nodes are
running on one server[xx.xx.xx.88] and the remaining two are running on
other server[xx.xx.xx.89].
each node has 5gb of heap memory.

Second, what we are doing with these nodes is,
A. we are loading documents into these nodes using routing mechanism.
i.e., we have one algorithm which will use msisdn number to generate
the routing value, and we use this routing value to insert the same record
into elasticsearch.
default value for shards and replica settings is: 8 for shards and 0
for replicas
So, eventually, All indices have 8 primary shards and 0 replicas.
when we insert any document, it will sit in a shard which has the
same value of the routing value.
This means for a fixed routing value, the document will send to a
fixed shard[this shard has the same value which is same as routing value]
every time.
And also this shard is fixed with one of the our Data Node.
So this indexing operation is tightly coupled with the record with the
node.

Unfortunately, one of our data node got failure during performing a query
operation[here, bulk loading operation is running continuously in
background].
the node that got failure is unresponsible and the other data node and
master node is responsible. Due to this,
some of the records are not inserted into elasticsearch[the records are
tightly coupled with nods, explained above in point A].

Now, what we need is,

  1. how to do load balance while the node got failure.
    [i.e., the shards of the node, that got down, should be available for index/query operation...]
  2. please share the details about the load balance, how it works and
  3. is it solve my problem or not?.

also tell me is there any other mechanism to fix this problem. please
share if you know.
We appreciate your reply..

Thanks and Regards
Rafi

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

  • Hi to All,*

  •  @Zachary Tong*
          Thank you very much for your Reply. and you gave me a good 
    

explanation about the routing mechanism.

  1. As of you told, It is absolutely correct to have Replicas to survive
    when we got, node failure/down.*
    P*roblem with Replica is that, it will occupy additional disk space,
    but we know, performance wise it is good to have Replicas.
  2. You told that it is not a good implementation to have more than one
    physical box[here, physical box means, the two servers xx.xx.xx.88 and
    xx.xx.xx.89] for multiple elasticsearch nodes.
    so you mean, we should have one elastic search node per one physical
    box[say, xx.xx.xx.88].

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi to All,

#Zachary Tong and Ralph Trickey
Thank you very much for your Reply. and you gave me a good explanation.

  1. As of you told, It is absolutely correct to have Replicas to survive
    when we got node failure/down.
    Problem with Replica is that, it will occupy additional disk space, but we
    know, performance wise it is good to have the Replicas.
    we are looking for any alternative solution for this problem. if we do not
    found the alternative solution, we must follow your suggestion.
    once again thanks for your suggestion on this problem.
  2. You told that it is not a good implementation to have more than one
    physical box for multiple elasticsearch nodes.
    [here, physical box means, the two servers we have, i.e., xx.xx.xx.88
    and xx.xx.xx.89]
    you mean, one master node[data=false] and one slave node[data=true] on one
    physical box[xx.xx.xx.88] and only one save node[data=true] on another
    phisical box[say, xx.xx.xx.89]
    Can you please give me further suggestion about this point 2?

#Norberto Meijome
Thank you very much for your Reply. we Appreciate your reply.

you have to tell ES which nodes to consider in the 'failure' zone and
keep the replicas apart - c.r.a.awareness.attribute.
There's an example in the docs using racks, if you run in EC2 you would
use the AZ.
Please, can u elaborate this. Thanks in Advance..

If any know about this, please reply.
Is there any other way to do load balance when we got node failure[in case
of no replica, and we have enable the routing to load documents].
and how to get all the documents of the shards of the node, that got down,
should be available for index/query operation.
[I mean, is there any mechanism in elasticsearch, such that, if any node got down, all shards of that particular node will be shrink to new node, so that we will do both index/query operations.]
we appreciate your reply.

Thanks and regards
Rafi

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Rafi,

  1. If you want redundant data, you have to make copies of the data.
    There is really no way around that. If all the data lives on a single
    node...and that node dies...how are you supposed to move the data off of
    it? It is simply impossible. If you have replicas enabled and a node goes
    down, they will automatically promote themselves to primary and you can
    continue indexing into them.

How big is your data? HDD space is fairly cheap these days

  1. I would have a single Elasticsearch instance for each physical
    machine. So one node on xx.88 and one node on xx.89. You can make both
    nodes [data=true, master=true], and they will determine who should be
    master. In this setup there is really no need for a dedicated master
    instance - that architecture only makes sense when you can have a third
    (lightweight) machine act as a dedicated master.

On Friday, August 30, 2013 3:13:02 AM UTC-4, rafi wrote:

Hi to All,

#Zachary Tong and Ralph Trickey
Thank you very much for your Reply. and you gave me a good explanation.

  1. As of you told, It is absolutely correct to have Replicas to survive
    when we got node failure/down.
    Problem with Replica is that, it will occupy additional disk space, but we
    know, performance wise it is good to have the Replicas.
    we are looking for any alternative solution for this problem. if we do not
    found the alternative solution, we must follow your suggestion.
    once again thanks for your suggestion on this problem.
  2. You told that it is not a good implementation to have more than one
    physical box for multiple elasticsearch nodes.
    [here, physical box means, the two servers we have, i.e., xx.xx.xx.88
    and xx.xx.xx.89]
    you mean, one master node[data=false] and one slave node[data=true] on one
    physical box[xx.xx.xx.88] and only one save node[data=true] on another
    phisical box[say, xx.xx.xx.89]
    Can you please give me further suggestion about this point 2?

#Norberto Meijome
Thank you very much for your Reply. we Appreciate your reply.

you have to tell ES which nodes to consider in the 'failure' zone and
keep the replicas apart - c.r.a.awareness.attribute.
There's an example in the docs using racks, if you run in EC2 you would
use the AZ.
Please, can u elaborate this. Thanks in Advance..

If any know about this, please reply.
Is there any other way to do load balance when we got node failure[in case
of no replica, and we have enable the routing to load documents].
and how to get all the documents of the shards of the node, that got down,
should be available for index/query operation.
[I mean, is there any mechanism in elasticsearch, such that, if any node got down, all shards of that particular node will be shrink to new node, so that we will do both index/query operations.]
we appreciate your reply.

Thanks and regards
Rafi

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Have you ever profiled elasticsearch with running just two servers setup
with the defaults? Your setup has been changes they smell like premature
optimizations (master nodes, no replicas, multiple nodes per server).

Cheers,

Ivan

On Fri, Aug 30, 2013 at 7:22 AM, Zachary Tong zacharyjtong@gmail.comwrote:

Hi Rafi,

  1. If you want redundant data, you have to make copies of the data.
    There is really no way around that. If all the data lives on a single
    node...and that node dies...how are you supposed to move the data off of
    it? It is simply impossible. If you have replicas enabled and a node goes
    down, they will automatically promote themselves to primary and you can
    continue indexing into them.

How big is your data? HDD space is fairly cheap these days

  1. I would have a single Elasticsearch instance for each physical
    machine. So one node on xx.88 and one node on xx.89. You can make both
    nodes [data=true, master=true], and they will determine who should be
    master. In this setup there is really no need for a dedicated master
    instance - that architecture only makes sense when you can have a third
    (lightweight) machine act as a dedicated master.

On Friday, August 30, 2013 3:13:02 AM UTC-4, rafi wrote:

Hi to All,

#Zachary Tong and Ralph Trickey
Thank you very much for your Reply. and you gave me a good explanation.

  1. As of you told, It is absolutely correct to have Replicas to survive
    when we got node failure/down.
    Problem with Replica is that, it will occupy additional disk space, but
    we know, performance wise it is good to have the Replicas.
    we are looking for any alternative solution for this problem. if we do
    not found the alternative solution, we must follow your suggestion.
    once again thanks for your suggestion on this problem.
  2. You told that it is not a good implementation to have more than one
    physical box for multiple elasticsearch nodes.
    [here, physical box means, the two servers we have, i.e., xx.xx.xx.88
    and xx.xx.xx.89]
    you mean, one master node[data=false] and one slave node[data=true] on
    one physical box[xx.xx.xx.88] and only one save node[data=true] on another
    phisical box[say, xx.xx.xx.89]
    Can you please give me further suggestion about this point 2?

#Norberto Meijome
Thank you very much for your Reply. we Appreciate your reply.

you have to tell ES which nodes to consider in the 'failure' zone and
keep the replicas apart - c.r.a.awareness.attribute.
There's an example in the docs using racks, if you run in EC2 you
would use the AZ.
Please, can u elaborate this. Thanks in Advance..

If any know about this, please reply.
Is there any other way to do load balance when we got node failure[in
case of no replica, and we have enable the routing to load documents].
and how to get all the documents of the shards of the node, that got
down, should be available for index/query operation.
[I mean, is there any mechanism in elasticsearch, such that, if any node got down, all shards of that particular node will be shrink to new node, so that we will do both index/query operations.]
we appreciate your reply.

Thanks and regards
Rafi

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Zachary Tong and Ivan,

Thanks a lot for your Reply. Now, i got idea about this Issue. Now, i 

will make changes as per you said.
Once again, Thanks for your detailed Explanation about what I have to
do to solve this Issue.

Warm Regards,
Rafi

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.