Heterogeneous clusters

Hi there,

We're working on a project that is going to use ES on a pretty large scale.
However we're thinking about deploying it on a lot of small boxes that also
run other tasks. The idea is to keep the overhead per server minimal, and
require no other servers. The amount of data indexed can however get quite
much: rough estimate might be 100GB / node, with a sliding window (e.g.
drop indices older then X days).

In practice this means that it runs on a box with 64GB RAM and SSD's, a
16GB RAM with 12x 1TB and also on a 1.7GB RAM with 300GB NFS storage. Does
ES handle sharding regarding the resources or do we have a "problem"?

Best regards,

Robin Verlangen
Software engineer
*
*
W http://www.robinverlangen.nl
E robin@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

--

In practice this means that it runs on a box with 64GB RAM and SSD's,
a 16GB RAM with 12x 1TB and also on a 1.7GB RAM with 300GB NFS
storage. Does ES handle sharding regarding the resources or do we have
a "problem"?

Houston calling :wink:

ES treats all nodes as equal, so you really want to make your cluster as
homogeneous as possible

clint

--

OK, sounds like we have a problem :wink:

Random idea: I see that ES can run on 1GB ram. Does it make any sense to
run 8 instances on a large box, and just one on a small box? Or am I trying
to do something that's just not designed to be like this?

Best regards,

Robin Verlangen
Software engineer
*
*
W http://www.robinverlangen.nl
E robin@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

2012/9/18 Clinton Gormley clint@traveljury.com

In practice this means that it runs on a box with 64GB RAM and SSD's,
a 16GB RAM with 12x 1TB and also on a 1.7GB RAM with 300GB NFS
storage. Does ES handle sharding regarding the resources or do we have
a "problem"?

Houston calling :wink:

ES treats all nodes as equal, so you really want to make your cluster as
homogeneous as possible

clint

--

--

On Tue, 2012-09-18 at 11:45 +0200, Robin Verlangen wrote:

OK, sounds like we have a problem :wink:

Random idea: I see that ES can run on 1GB ram. Does it make any sense
to run 8 instances on a large box, and just one on a small box? Or am
I trying to do something that's just not designed to be like this?

The problem you have with that is that you might end up with primaries
and replicas on one box. That box goes down, and you've lost your data.

clint

Best regards,

Robin Verlangen
Software engineer

W http://www.robinverlangen.nl
E robin@us2.nl

Disclaimer: The information contained in this message and attachments
is intended solely for the attention and use of the named addressee
and may be confidential. If you are not the intended recipient, you
are reminded that the information remains the property of the sender.
You must not use, disclose, distribute, copy, print or rely on this
e-mail. If you have received this message in error, please contact the
sender immediately and irrevocably delete this message and any copies.

2012/9/18 Clinton Gormley clint@traveljury.com

    >
    > In practice this means that it runs on a box with 64GB RAM
    and SSD's,
    > a 16GB RAM with 12x 1TB and also on a 1.7GB RAM with 300GB
    NFS
    > storage. Does ES handle sharding regarding the resources or
    do we have
    > a "problem"?
    
    
    Houston calling ;)
    
    ES treats all nodes as equal, so you really want to make your
    cluster as
    homogeneous as possible
    
    clint
    
    
    --

--

--

The problem you have with that is that you might end up with primaries
and replicas on one box. That box goes down, and you've lost your data.

Also, if you're only running with 1GB a proportionately larger amount of
RAM will be used for java, code, and state than you'd have with bigger
boxes with more RAM

clint

--

That makes sense indeed. I'll go through our design again and see how we
can resolve this without adding a lot of overhead on machines.

Do you know whether this heterogeneous support is something in line for the
near-future?

Best regards,

Robin Verlangen
Software engineer
*
*
W http://www.robinverlangen.nl
E robin@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

2012/9/18 Clinton Gormley clint@traveljury.com

The problem you have with that is that you might end up with primaries
and replicas on one box. That box goes down, and you've lost your data.

Also, if you're only running with 1GB a proportionately larger amount of
RAM will be used for java, code, and state than you'd have with bigger
boxes with more RAM

clint

--

--

There is a setting called: cluster.routing.allocation.same_shard.host, where if you set it to true, it will make sure not to allocate a shard and a replica on the same "host", regardless of instances running on the host (where host is based on the network address).

On Sep 18, 2012, at 11:52 AM, Clinton Gormley clint@traveljury.com wrote:

On Tue, 2012-09-18 at 11:45 +0200, Robin Verlangen wrote:

OK, sounds like we have a problem :wink:

Random idea: I see that ES can run on 1GB ram. Does it make any sense
to run 8 instances on a large box, and just one on a small box? Or am
I trying to do something that's just not designed to be like this?

The problem you have with that is that you might end up with primaries
and replicas on one box. That box goes down, and you've lost your data.

clint

Best regards,

Robin Verlangen
Software engineer

W http://www.robinverlangen.nl
E robin@us2.nl

Disclaimer: The information contained in this message and attachments
is intended solely for the attention and use of the named addressee
and may be confidential. If you are not the intended recipient, you
are reminded that the information remains the property of the sender.
You must not use, disclose, distribute, copy, print or rely on this
e-mail. If you have received this message in error, please contact the
sender immediately and irrevocably delete this message and any copies.

2012/9/18 Clinton Gormley clint@traveljury.com

In practice this means that it runs on a box with 64GB RAM
and SSD's,
a 16GB RAM with 12x 1TB and also on a 1.7GB RAM with 300GB
NFS
storage. Does ES handle sharding regarding the resources or
do we have
a "problem"?

   Houston calling ;)

   ES treats all nodes as equal, so you really want to make your
   cluster as
   homogeneous as possible

   clint


   --

--

--

--

That sounds like a possible workaround without real problems. However I
think ES is not designed for this purpose, we might need to reconsider the
pros and cons.

Best regards,

Robin Verlangen
Software engineer
*
*
W http://www.robinverlangen.nl
E robin@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

2012/9/18 Shay Banon kimchy@gmail.com

There is a setting called: cluster.routing.allocation.same_shard.host,
where if you set it to true, it will make sure not to allocate a shard and
a replica on the same "host", regardless of instances running on the host
(where host is based on the network address).

On Sep 18, 2012, at 11:52 AM, Clinton Gormley clint@traveljury.com
wrote:

On Tue, 2012-09-18 at 11:45 +0200, Robin Verlangen wrote:

OK, sounds like we have a problem :wink:

Random idea: I see that ES can run on 1GB ram. Does it make any sense
to run 8 instances on a large box, and just one on a small box? Or am
I trying to do something that's just not designed to be like this?

The problem you have with that is that you might end up with primaries
and replicas on one box. That box goes down, and you've lost your data.

clint

Best regards,

Robin Verlangen
Software engineer

W http://www.robinverlangen.nl
E robin@us2.nl

Disclaimer: The information contained in this message and attachments
is intended solely for the attention and use of the named addressee
and may be confidential. If you are not the intended recipient, you
are reminded that the information remains the property of the sender.
You must not use, disclose, distribute, copy, print or rely on this
e-mail. If you have received this message in error, please contact the
sender immediately and irrevocably delete this message and any copies.

2012/9/18 Clinton Gormley clint@traveljury.com

In practice this means that it runs on a box with 64GB RAM
and SSD's,
a 16GB RAM with 12x 1TB and also on a 1.7GB RAM with 300GB
NFS
storage. Does ES handle sharding regarding the resources or
do we have
a "problem"?

   Houston calling ;)

   ES treats all nodes as equal, so you really want to make your
   cluster as
   homogeneous as possible

   clint


   --

--

--

--

--

Agreed, you can try and work around it by having several instances running and so on, but it makes more sense to have same size nodes in the cluster.

On Sep 18, 2012, at 12:01 PM, Robin Verlangen robin@us2.nl wrote:

That sounds like a possible workaround without real problems. However I think ES is not designed for this purpose, we might need to reconsider the pros and cons.

Best regards,

Robin Verlangen
Software engineer

W http://www.robinverlangen.nl
E robin@us2.nl

Disclaimer: The information contained in this message and attachments is intended solely for the attention and use of the named addressee and may be confidential. If you are not the intended recipient, you are reminded that the information remains the property of the sender. You must not use, disclose, distribute, copy, print or rely on this e-mail. If you have received this message in error, please contact the sender immediately and irrevocably delete this message and any copies.

2012/9/18 Shay Banon kimchy@gmail.com
There is a setting called: cluster.routing.allocation.same_shard.host, where if you set it to true, it will make sure not to allocate a shard and a replica on the same "host", regardless of instances running on the host (where host is based on the network address).

On Sep 18, 2012, at 11:52 AM, Clinton Gormley clint@traveljury.com wrote:

On Tue, 2012-09-18 at 11:45 +0200, Robin Verlangen wrote:

OK, sounds like we have a problem :wink:

Random idea: I see that ES can run on 1GB ram. Does it make any sense
to run 8 instances on a large box, and just one on a small box? Or am
I trying to do something that's just not designed to be like this?

The problem you have with that is that you might end up with primaries
and replicas on one box. That box goes down, and you've lost your data.

clint

Best regards,

Robin Verlangen
Software engineer

W http://www.robinverlangen.nl
E robin@us2.nl

Disclaimer: The information contained in this message and attachments
is intended solely for the attention and use of the named addressee
and may be confidential. If you are not the intended recipient, you
are reminded that the information remains the property of the sender.
You must not use, disclose, distribute, copy, print or rely on this
e-mail. If you have received this message in error, please contact the
sender immediately and irrevocably delete this message and any copies.

2012/9/18 Clinton Gormley clint@traveljury.com

In practice this means that it runs on a box with 64GB RAM
and SSD's,
a 16GB RAM with 12x 1TB and also on a 1.7GB RAM with 300GB
NFS
storage. Does ES handle sharding regarding the resources or
do we have
a "problem"?

   Houston calling ;)

   ES treats all nodes as equal, so you really want to make your
   cluster as
   homogeneous as possible

   clint


   --

--

--

--

--

--

Well actually we would like to spread the load over every machine running
our software, but with this ES behavior that doesn't sound like a good
plan. I think we should go with an option of online running ES on a subset
of machines that are equal in terms of resources.

Best regards,

Robin Verlangen
Software engineer
*
*
W http://www.robinverlangen.nl
E robin@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

2012/9/18 Shay Banon kimchy@gmail.com

Agreed, you can try and work around it by having several instances running
and so on, but it makes more sense to have same size nodes in the cluster.

On Sep 18, 2012, at 12:01 PM, Robin Verlangen robin@us2.nl wrote:

That sounds like a possible workaround without real problems. However I
think ES is not designed for this purpose, we might need to reconsider the
pros and cons.

Best regards,

Robin Verlangen
Software engineer
*
*
W http://www.robinverlangen.nl
E robin@us2.nl

Disclaimer: The information contained in this message and attachments is
intended solely for the attention and use of the named addressee and may be
confidential. If you are not the intended recipient, you are reminded that
the information remains the property of the sender. You must not use,
disclose, distribute, copy, print or rely on this e-mail. If you have
received this message in error, please contact the sender immediately and
irrevocably delete this message and any copies.

2012/9/18 Shay Banon kimchy@gmail.com

There is a setting called: cluster.routing.allocation.same_shard.host,
where if you set it to true, it will make sure not to allocate a shard and
a replica on the same "host", regardless of instances running on the host
(where host is based on the network address).

On Sep 18, 2012, at 11:52 AM, Clinton Gormley clint@traveljury.com
wrote:

On Tue, 2012-09-18 at 11:45 +0200, Robin Verlangen wrote:

OK, sounds like we have a problem :wink:

Random idea: I see that ES can run on 1GB ram. Does it make any sense
to run 8 instances on a large box, and just one on a small box? Or am
I trying to do something that's just not designed to be like this?

The problem you have with that is that you might end up with primaries
and replicas on one box. That box goes down, and you've lost your data.

clint

Best regards,

Robin Verlangen
Software engineer

W http://www.robinverlangen.nl
E robin@us2.nl

Disclaimer: The information contained in this message and attachments
is intended solely for the attention and use of the named addressee
and may be confidential. If you are not the intended recipient, you
are reminded that the information remains the property of the sender.
You must not use, disclose, distribute, copy, print or rely on this
e-mail. If you have received this message in error, please contact the
sender immediately and irrevocably delete this message and any copies.

2012/9/18 Clinton Gormley clint@traveljury.com

In practice this means that it runs on a box with 64GB RAM
and SSD's,
a 16GB RAM with 12x 1TB and also on a 1.7GB RAM with 300GB
NFS
storage. Does ES handle sharding regarding the resources or
do we have
a "problem"?

   Houston calling ;)

   ES treats all nodes as equal, so you really want to make your
   cluster as
   homogeneous as possible

   clint


   --

--

--

--

--

--

--