Well, the only reason to have more than one shard per node is to support
eventual growth/scaling. There isn't a good reason to have more than one
shard per node other than to plan for future growth. If your data is
fixed and not growing, you should plan a single shard per server from the
start because it is more efficient. That's a generalization, but true more
often than not.
The idea is to capacity plan how much a single shard is capable of
holding/servicing so that you can use it to extrapolate. E.g. "In six
months, we expect to have 500m docs and 1000 queries per second". Based on
the test I outlined, you find out that it will require 30 servers to meet
your SLA. But today you only have 30m docs, so you can get away with 3
servers and just add new nodes as your data increases. It's a way to take
guesswork out of scaling.
Obviously more shards per node will impact performance (some), but that's
why you plan to add more nodes as performance lags.
Hope that helps!
PS. I personally use Marvel, but the other plugins/APIs mentioned are very
On Friday, March 21, 2014 12:56:56 PM UTC-4, Rajan Bhatt wrote:
So on single node this test will tell us how much single node with Single
shard can get us. Now if we want to deploy more shards/per node then we
need take into consideration, that more shard/per node would consume more
resources ( File Descriptor, Memory, etc..) and performance would degrade
as more shards are added to node.
This is tricky and milage can vary with different work load ( Indexing +
Searching ) ..
I am not sure you would be able to describe at very high level your
deployment ( Number of ES nodes + number of Index + Shards + Replica ) to
get some idea..
I appreciate your answer and your time.
btw,which tool you use for monitoring ES cluster and what you monitor ?
On Thursday, March 20, 2014 2:05:52 PM UTC-7, Zachary Tong wrote:
Unfortunately, there is no way that we can tell you an optimal number.
But there is a way that you can perform some capacity tests, and arrive at
usable numbers that you can extrapolate from. The process is very simple:
- Create a single index, with a single shard, on a single
- Start indexing *real, production-style *data. "Fake" or "dummy"
data won't work here, it needs to mimic real-world data
- Periodically, run real-world queries that you would expect users to
- At some point, you'll find that performance is no longer acceptable
to you. Perhaps the indexing rate becomes too slow. Or perhaps query
latency is too slow. Or perhaps your node just runs out of memory
- Write down the number of documents in the shard, and the physical
size of the shard
Now you know the limit of a single shard given your hardware + queries +
data. Using that knowledge, you can extrapolate given your expected
search/indexing load, and how many documents you expect to index over the
next few years, etc.
On Thursday, March 20, 2014 3:29:47 PM UTC-5, Rajan Bhatt wrote:
I would appreciate if someone can suggest optimal number of shards per
ES node for optimal performance or any recommended way to arrive at number
of shards given number of core and memory foot print.
Thanks in advance
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to firstname.lastname@example.org.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5eaebecc-009e-4cd1-be7a-80b3b28b1cc1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.