Multiple nodes on same machine

I plan to use elasticsearch as documentation retrieval engine which will
serve hundreds of millions of documents, but the query rate will be low.
The ES cluster will probably receive a few queries only each hour.

We are planning to use ec2 m2.2xlarge instance, each with 32G memory and 4
CPU cores, so I like to run 4 ES nodes on each ec2 instance to maximize the
CPU utilization rate. In this case, is it beneficial to run multiple nodes
on same machine?

My own experience with Solr is that it does help to use resources more
efficiently.

Regards,
Ming

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

No, it is not beneficial.

Here are the reasons:

a) if you start many JVMs, you create a JVM-induced overhead. That is,
JVMs compete for the resources the OS provide (CPU, network, memory).
Because the OS must decide which JVM does get which resources, it takes
more time and space to make decisions, and this is not negelectible. The
more JVMs you execute in parallel, the higher the risk of overall system
degradation and in many cases the risk of paging (swapping) is higher.

b) the ES code is optimized for scalability. What does that mean? You
can increase the parameters for CPU (threads), memory (heap) and network
(netty pools) for the ES JVM and this increases the overall power as
much as your machine can get along with it. There is no reason why you
should not dedicate a whole machine to one single ES node.

c) a single ES JVM can manage hundreds or thousands of Lucene indexes at
once. This is done by index sharding and automatic workload
distribution. Each node can hold many indices with many index shards. An
ES node does not restrict you to a model of a single index with a single
shard.

Jörg

Am 20.03.13 00:35, schrieb mfyang@wisewindow.com:

In this case, is it beneficial to run multiple nodes on same machine?

My own experience with Solr is that it does help to use resources more
efficiently.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Jorg,

Thanks for the info, very useful. So basically I can run one ES instance
which holds multiple shards, and once each shard gets big, I can migrate
them to separate machines?

Thanks,
Ming

On Tuesday, March 19, 2013 5:18:02 PM UTC-7, Jörg Prante wrote:

No, it is not beneficial.

Here are the reasons:

a) if you start many JVMs, you create a JVM-induced overhead. That is,
JVMs compete for the resources the OS provide (CPU, network, memory).
Because the OS must decide which JVM does get which resources, it takes
more time and space to make decisions, and this is not negelectible. The
more JVMs you execute in parallel, the higher the risk of overall system
degradation and in many cases the risk of paging (swapping) is higher.

b) the ES code is optimized for scalability. What does that mean? You
can increase the parameters for CPU (threads), memory (heap) and network
(netty pools) for the ES JVM and this increases the overall power as
much as your machine can get along with it. There is no reason why you
should not dedicate a whole machine to one single ES node.

c) a single ES JVM can manage hundreds or thousands of Lucene indexes at
once. This is done by index sharding and automatic workload
distribution. Each node can hold many indices with many index shards. An
ES node does not restrict you to a model of a single index with a single
shard.

Jörg

Am 20.03.13 00:35, schrieb mfy...@wisewindow.com <javascript:>:

In this case, is it beneficial to run multiple nodes on same machine?

My own experience with Solr is that it does help to use resources more
efficiently.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Totally agree with 32G machines, but as memory gets cheaper and cheaper I'm
curious if anyone has actually done any benchmarking or stress tests on the
single vs multi node with large memory machines.

We actually run 2 nodes (22G each) on our 64G machines.

a) so we can have -XX:+UseCompressedOops
b) with the theory (untested) that GC pauses will be faster/less often/...

On Tuesday, March 19, 2013 8:18:02 PM UTC-4, Jörg Prante wrote:

No, it is not beneficial.

Here are the reasons:

a) if you start many JVMs, you create a JVM-induced overhead. That is,
JVMs compete for the resources the OS provide (CPU, network, memory).
Because the OS must decide which JVM does get which resources, it takes
more time and space to make decisions, and this is not negelectible. The
more JVMs you execute in parallel, the higher the risk of overall system
degradation and in many cases the risk of paging (swapping) is higher.

b) the ES code is optimized for scalability. What does that mean? You
can increase the parameters for CPU (threads), memory (heap) and network
(netty pools) for the ES JVM and this increases the overall power as
much as your machine can get along with it. There is no reason why you
should not dedicate a whole machine to one single ES node.

c) a single ES JVM can manage hundreds or thousands of Lucene indexes at
once. This is done by index sharding and automatic workload
distribution. Each node can hold many indices with many index shards. An
ES node does not restrict you to a model of a single index with a single
shard.

Jörg

Am 20.03.13 00:35, schrieb mfy...@wisewindow.com <javascript:>:

In this case, is it beneficial to run multiple nodes on same machine?

My own experience with Solr is that it does help to use resources more
efficiently.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

I have heard that multi core is maximum utlized with different process
rather than different threads.
If that is true and if the machine has many cores , wont muliple instance
be a good idea ?

Thanks
Vineeth

On Wed, Mar 20, 2013 at 6:13 PM, Andy Wick andywick@gmail.com wrote:

Totally agree with 32G machines, but as memory gets cheaper and cheaper
I'm curious if anyone has actually done any benchmarking or stress tests on
the single vs multi node with large memory machines.

We actually run 2 nodes (22G each) on our 64G machines.

a) so we can have -XX:+UseCompressedOops
b) with the theory (untested) that GC pauses will be faster/less often/...

On Tuesday, March 19, 2013 8:18:02 PM UTC-4, Jörg Prante wrote:

No, it is not beneficial.

Here are the reasons:

a) if you start many JVMs, you create a JVM-induced overhead. That is,
JVMs compete for the resources the OS provide (CPU, network, memory).
Because the OS must decide which JVM does get which resources, it takes
more time and space to make decisions, and this is not negelectible. The
more JVMs you execute in parallel, the higher the risk of overall system
degradation and in many cases the risk of paging (swapping) is higher.

b) the ES code is optimized for scalability. What does that mean? You
can increase the parameters for CPU (threads), memory (heap) and network
(netty pools) for the ES JVM and this increases the overall power as
much as your machine can get along with it. There is no reason why you
should not dedicate a whole machine to one single ES node.

c) a single ES JVM can manage hundreds or thousands of Lucene indexes at
once. This is done by index sharding and automatic workload
distribution. Each node can hold many indices with many index shards. An
ES node does not restrict you to a model of a single index with a single
shard.

Jörg

Am 20.03.13 00:35, schrieb mfy...@wisewindow.com:

In this case, is it beneficial to run multiple nodes on same machine?

My own experience with Solr is that it does help to use resources more
efficiently.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.