Other than the resource footprint, is there any reason we should avoid
running multiple node instances of a cluster on the same machine, assuming
all the shard awareness stuff is in place to keep all the copies of a given
shard from being stored on those nodes that are all resident on a single
physical box?
Basically, if I had a cluster of, say six machines with 512GB of ram a
piece, is it reasonable to run six instances of ES per machine and 30GB of
swap per instance allocated, resulting in a 36 node cluster with and a bit
over a terabyte of memory footprint across the cluster?
On Wed, Jan 29, 2014 at 4:50 PM, Josh Harrison hijakk@gmail.com wrote:
Other than the resource footprint, is there any reason we should avoid
running multiple node instances of a cluster on the same machine, assuming
all the shard awareness stuff is in place to keep all the copies of a given
shard from being stored on those nodes that are all resident on a single
physical box?
Basically, if I had a cluster of, say six machines with 512GB of ram a
piece, is it reasonable to run six instances of ES per machine and 30GB of
swap per instance allocated, resulting in a 36 node cluster with and a bit
over a terabyte of memory footprint across the cluster?
So far as I've heard this is sensible if heap is your bottleneck.
Obviously it doesn't help is disk space, iops, or cpu are your bottleneck.
The deb doesn't support it without modification. Now you know all about
the subject that I do
On Wed, Jan 29, 2014 at 4:50 PM, Josh Harrison hijakk@gmail.com wrote:
Other than the resource footprint, is there any reason we should avoid
running multiple node instances of a cluster on the same machine, assuming
all the shard awareness stuff is in place to keep all the copies of a given
shard from being stored on those nodes that are all resident on a single
physical box?
Basically, if I had a cluster of, say six machines with 512GB of ram a
piece, is it reasonable to run six instances of ES per machine and 30GB of
swap per instance allocated, resulting in a 36 node cluster with and a bit
over a terabyte of memory footprint across the cluster?
So far as I've heard this is sensible if heap is your bottleneck.
Obviously it doesn't help is disk space, iops, or cpu are your bottleneck.
The deb doesn't support it without modification. Now you know all about
the subject that I do
You should consider the RAM to CPU/Disk ratio. On systems with huge memory,
CPUs have the tendency to become weak, and the I/O subsystem must push data
with higher pressure from RAM to drive (spindle or SSD).
Huge RAM helps for caching strategies but also creates headaches, large
caches must be long lived and must not collapse, which is hard in a large
JVM heap, and JVM garbage collection will take more resources and time.
Running multiple JVMs on a single host only looks like a viable solution,
but that is not how ES scales. ES scales horizontally over many machines,
not vertically over RAM size.
So you should take care that your CPU performance is not suffering. There
is overhead also on the OS layer and it depends on the setup.
A 36 node cluster on 6 machines adds another challenge. You must tell ES
how your nodes are organized, in order to get a reliable green/yellow/red
cluster health for your shard allocation.
Thanks Jörg, Mark and Nikolas, some great information here. The 6x6
configuration was something of a worst case example, the farthest we'd
probably stretch it would be 3 nodes per host on 16-18 hosts, which should
be a little more reasonable. Hopefully we'll be able to do a support
contract with the commercial side of ES and get some help building out a
system that meets our exact needs.
On Wednesday, January 29, 2014 5:02:05 PM UTC-8, Jörg Prante wrote:
You should consider the RAM to CPU/Disk ratio. On systems with huge
memory, CPUs have the tendency to become weak, and the I/O subsystem must
push data with higher pressure from RAM to drive (spindle or SSD).
Huge RAM helps for caching strategies but also creates headaches, large
caches must be long lived and must not collapse, which is hard in a large
JVM heap, and JVM garbage collection will take more resources and time.
Running multiple JVMs on a single host only looks like a viable solution,
but that is not how ES scales. ES scales horizontally over many machines,
not vertically over RAM size.
So you should take care that your CPU performance is not suffering. There
is overhead also on the OS layer and it depends on the setup.
A 36 node cluster on 6 machines adds another challenge. You must tell ES
how your nodes are organized, in order to get a reliable green/yellow/red
cluster health for your shard allocation.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.