When designing a dedicated ES cluster, does it make more sense to have two
or three servers with a ton of resources each, or 5-10 cheaper commodity
hardware systems?
I know when an http request comes into a given machine, it can
automatically be routed to another. Does this routing happen over 9200 or
9300? I'm picturing having the cheap machines connected to both our
internal network for 9200, and to each other on a private 10.0.0.0 network
for 9300. Would that result in a performance boost?
On Mon, Dec 9, 2013 at 2:47 PM, Josh Harrison hijakk@gmail.com wrote:
When designing a dedicated ES cluster, does it make more sense to have two
or three servers with a ton of resources each, or 5-10 cheaper commodity
hardware systems?
I'd go commodity to get more ram. The sweet spot (I've heard) is 64gb per
machine with each one running with a 30gb heap.
I know when an http request comes into a given machine, it can
automatically be routed to another. Does this routing happen over 9200 or
9300?
9300 normally. If you run more than one ES on the node on will run on
9301, I believe.
I'm picturing having the cheap machines connected to both our internal
network for 9200, and to each other on a private 10.0.0.0 network for 9300.
Would that result in a performance boost?
I think there are too many variables here. So long as you have fast
switched connections between the machines you should be ok.
Elasticsearch currently doesn't know how to handle nodes of significantly
different power automatically. It won't balance automatically based on
machine power but there are things you can do to help that some.
No more than 64G (with a 32G heap) is best as java doesn't compress
pointers over 32g, which means you lose out.
We were running a cluster of 8 nodes with 512G per node and the resource
wastage was immense, as was the GC! Not to mention java wouldn't even run
with a 256G heap size.
On Mon, Dec 9, 2013 at 2:47 PM, Josh Harrison hijakk@gmail.com wrote:
When designing a dedicated ES cluster, does it make more sense to have
two or three servers with a ton of resources each, or 5-10 cheaper
commodity hardware systems?
I'd go commodity to get more ram. The sweet spot (I've heard) is 64gb per
machine with each one running with a 30gb heap.
I know when an http request comes into a given machine, it can
automatically be routed to another. Does this routing happen over 9200 or
9300?
9300 normally. If you run more than one ES on the node on will run on
9301, I believe.
I'm picturing having the cheap machines connected to both our internal
network for 9200, and to each other on a private 10.0.0.0 network for 9300.
Would that result in a performance boost?
I think there are too many variables here. So long as you have fast
switched connections between the machines you should be ok.
Elasticsearch currently doesn't know how to handle nodes of significantly
different power automatically. It won't balance automatically based on
machine power but there are things you can do to help that some.
What performance do you ask for: maximum speed for executing a query? Or
maximum throughput of the overall system for all queries?
In general, ES is not designed for vertical scaling on few big oomph
machines. ES design for scaling out horizontally over lots of commodity
machines of same type. Note that you can not get faster the more machines
you add, but you get higher overall throughput.
Great, thanks all. Better throughput is the goal. I'll have to see if I can
scrounge some decent systems up!
On Monday, December 9, 2013 3:47:28 PM UTC-8, Jörg Prante wrote:
What performance do you ask for: maximum speed for executing a query? Or
maximum throughput of the overall system for all queries?
In general, ES is not designed for vertical scaling on few big oomph
machines. ES design for scaling out horizontally over lots of commodity
machines of same type. Note that you can not get faster the more machines
you add, but you get higher overall throughput.
Hm, ok, so ES may not deal with substantially different capabilities of
machines in terms of speed, but if I can throw a bunch of older systems
with only a few GB of ram and a few hundred GB of storage space, is ES
aware of the space constraints - distributing shards and replicas so that
they don't hit the storage capacity limit right away?
Thanks,
Josh
On Tuesday, December 10, 2013 9:07:52 AM UTC-8, Josh Harrison wrote:
Great, thanks all. Better throughput is the goal. I'll have to see if I
can scrounge some decent systems up!
On Monday, December 9, 2013 3:47:28 PM UTC-8, Jörg Prante wrote:
What performance do you ask for: maximum speed for executing a query? Or
maximum throughput of the overall system for all queries?
In general, ES is not designed for vertical scaling on few big oomph
machines. ES design for scaling out horizontally over lots of commodity
machines of same type. Note that you can not get faster the more machines
you add, but you get higher overall throughput.
On Tue, Dec 10, 2013 at 1:50 PM, Josh Harrison hijakk@gmail.com wrote:
Hm, ok, so ES may not deal with substantially different capabilities of
machines in terms of speed, but if I can throw a bunch of older systems
with only a few GB of ram and a few hundred GB of storage space, is ES
aware of the space constraints - distributing shards and replicas so that
they don't hit the storage capacity limit right away?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.