I have done a little experimenting in that area but the material is
quite large for my tight time schedule to write it up in a reasonable
blog post or something, I know it's awful.
Here is the essence of my thoughts to your questions.
CPU to shard ratio: a shard is a Lucene index, so the answer depends on
the size of the shards you create and the power of the CPU for Lucene
queries. I recommend shard sizes around 1 GB so they can be handled by
replication / recovery in a few seconds. The network bandwidth must be
there for that. Once the cluster is up, check your query load
distribution. Then my rule of thumb is "one shard per CPU core". The
basic idea is, you can put load on the CPU and it can execute requests
on every shard on the node without delay. In reality, Lucene is
multithreaded, and not every shard is involved in queries, so you can
put more shards on a node. A node should be able to serve around
100-1000 qps with current CPU cores and network interfaces (total
turnaround time with all the latencies, I measured for average random
term queries of 250-500 qps on an AMD Opteron 4170 six core - YMMV with
large docs and mappings). Advantage of ES is, you can take bread and
butter CPUs out there and spread it over many machines very easily by
adding nodes to the ES cluster, without compromising the response time.
A performance factor might be different shard sizes on a node according
to the index sizes, but there is not much you can do about that, unless
you have control about all the indexes are equally sized, which is far
from reailty.
IOPS to disk size ratio: you won't have to care much about disk size as
they are today if you have a suitable SAS2/SATA3 RAID controller
(6Gbit/s) and enough PCIe lanes for disk transfer. That is, use disk
sizes you can get that fit to your HBA (currently available sizes are
500 GB and up) . Due to your layout of the ES cluster for massive
indexing, you may have to cope with write IO challenges, which often
originates in wrong balanced CPU / Disk performance ratio of the
hardware of a node. Read IO challenges are rare because you can ramp up
vast amount of RAM, and mmap the whole ES process. Plus, Java NIO is
well managed by OS filesystem (here, JRE 7 NIO2 provides more
interesting features than a JRE 6 NIO but that is another topic, mainly
for Netty). Adding disks to your system that are fast helps a lot. IOPS
does not matter anymore with SSD, they go through the roof. Run your ES
indexes on SSD and you want never go back to spindle disks. Small SSD
can be configured as RAID 0 for best write performance. I measured 1.5
GBytes reads per second and 800 GBytes writes per second with four 128G
Plextor M5S SSDs in a RAID 0 on an LSI 2008 HBA. There is no need to set
up RAID 1 or higher because of the built-in redundancy of ES replica
setting. If a disk subsystem fails, let the whole node happily fail -
you still have the cluster up and running.
If you have more questions, just ask
Jörg
Am 16.05.13 21:30, schrieb Wojons Tech:
I really been looking for one place that gives some great config
settings changes suggestions for us ops guys and how to setup an
elasticsearch cluster cpu to shard ratios how many replicas to have.
IOPS to disk size ratio that sort of intresting stuff. --
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.