Multiple nodes on a powerful system?


(Josh Harrison) #1

Other than the resource footprint, is there any reason we should avoid
running multiple node instances of a cluster on the same machine, assuming
all the shard awareness stuff is in place to keep all the copies of a given
shard from being stored on those nodes that are all resident on a single
physical box?
Basically, if I had a cluster of, say six machines with 512GB of ram a
piece, is it reasonable to run six instances of ES per machine and 30GB of
swap per instance allocated, resulting in a 36 node cluster with and a bit
over a terabyte of memory footprint across the cluster?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1df2f7ff-26af-4ce8-a734-29aa9dbd90dd%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Nik Everett) #2

On Wed, Jan 29, 2014 at 4:50 PM, Josh Harrison hijakk@gmail.com wrote:

Other than the resource footprint, is there any reason we should avoid
running multiple node instances of a cluster on the same machine, assuming
all the shard awareness stuff is in place to keep all the copies of a given
shard from being stored on those nodes that are all resident on a single
physical box?
Basically, if I had a cluster of, say six machines with 512GB of ram a
piece, is it reasonable to run six instances of ES per machine and 30GB of
swap per instance allocated, resulting in a 36 node cluster with and a bit
over a terabyte of memory footprint across the cluster?

So far as I've heard this is sensible if heap is your bottleneck.
Obviously it doesn't help is disk space, iops, or cpu are your bottleneck.
The deb doesn't support it without modification. Now you know all about
the subject that I do :slight_smile:

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3xVDaWR6rcrcUdPYnn6JdiyxxdQ3hshq5y7PWG_bdtcQ%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Mark Walkom) #3

I know some people on IRC are using containers (docker), we went down the
virtualisation path instead.
Both work fine.

Regards,
Mark Walkom

Infrastructure Engineer
Campaign Monitor
email: markw@campaignmonitor.com
web: www.campaignmonitor.com

On 30 January 2014 09:03, Nikolas Everett nik9000@gmail.com wrote:

On Wed, Jan 29, 2014 at 4:50 PM, Josh Harrison hijakk@gmail.com wrote:

Other than the resource footprint, is there any reason we should avoid
running multiple node instances of a cluster on the same machine, assuming
all the shard awareness stuff is in place to keep all the copies of a given
shard from being stored on those nodes that are all resident on a single
physical box?
Basically, if I had a cluster of, say six machines with 512GB of ram a
piece, is it reasonable to run six instances of ES per machine and 30GB of
swap per instance allocated, resulting in a 36 node cluster with and a bit
over a terabyte of memory footprint across the cluster?

So far as I've heard this is sensible if heap is your bottleneck.
Obviously it doesn't help is disk space, iops, or cpu are your bottleneck.
The deb doesn't support it without modification. Now you know all about
the subject that I do :slight_smile:

Nik

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3xVDaWR6rcrcUdPYnn6JdiyxxdQ3hshq5y7PWG_bdtcQ%40mail.gmail.com
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEM624YukuY2mHJ%3DgEB4jvhWpwhu93r9m%2BOczMS9%3DYJ%3Dfk4V7A%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jörg Prante) #4

You should consider the RAM to CPU/Disk ratio. On systems with huge memory,
CPUs have the tendency to become weak, and the I/O subsystem must push data
with higher pressure from RAM to drive (spindle or SSD).
Huge RAM helps for caching strategies but also creates headaches, large
caches must be long lived and must not collapse, which is hard in a large
JVM heap, and JVM garbage collection will take more resources and time.

Running multiple JVMs on a single host only looks like a viable solution,
but that is not how ES scales. ES scales horizontally over many machines,
not vertically over RAM size.

So you should take care that your CPU performance is not suffering. There
is overhead also on the OS layer and it depends on the setup.

A 36 node cluster on 6 machines adds another challenge. You must tell ES
how your nodes are organized, in order to get a reliable green/yellow/red
cluster health for your shard allocation.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoF7PA2nMNOo53dxqcT%3DYOFYfmB5A_YvJGk%3DGCa2Tu3t0A%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Josh Harrison) #5

Thanks Jörg, Mark and Nikolas, some great information here. The 6x6
configuration was something of a worst case example, the farthest we'd
probably stretch it would be 3 nodes per host on 16-18 hosts, which should
be a little more reasonable. Hopefully we'll be able to do a support
contract with the commercial side of ES and get some help building out a
system that meets our exact needs.

On Wednesday, January 29, 2014 5:02:05 PM UTC-8, Jörg Prante wrote:

You should consider the RAM to CPU/Disk ratio. On systems with huge
memory, CPUs have the tendency to become weak, and the I/O subsystem must
push data with higher pressure from RAM to drive (spindle or SSD).
Huge RAM helps for caching strategies but also creates headaches, large
caches must be long lived and must not collapse, which is hard in a large
JVM heap, and JVM garbage collection will take more resources and time.

Running multiple JVMs on a single host only looks like a viable solution,
but that is not how ES scales. ES scales horizontally over many machines,
not vertically over RAM size.

So you should take care that your CPU performance is not suffering. There
is overhead also on the OS layer and it depends on the setup.

A 36 node cluster on 6 machines adds another challenge. You must tell ES
how your nodes are organized, in order to get a reliable green/yellow/red
cluster health for your shard allocation.

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/156f7ee8-c66c-40e0-9774-e442dd2f1976%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(system) #6