Optimal Number of Shards

Hi,

Lets say I have 100 node cluster, each with 16G heap (50% of the RAM). I
have single index, its 1G, and I know it won't be growing much (will never
grow above the heap size).

  1. Is having single shard better than having 5 by default? Then to spread
    the load will run 99 replicas

  2. If index size < heap size, is all that heap memory wasted? I mean lets
    say I have 5G index, would it be better to have 4 nodes with 5G heap or 2
    with 10G heap?

Thank you,

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/be8255a3-28c3-4754-ac17-c94e2b16afaf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

1GB is small, it's very easy to make it fit entirely in the filesystem
cache. For this kind of small index where you want to optimize the search
throughput, just have one shard per index and have it replicated once per
node (option 1 that you described).

Regarding 2, indeed with such a small index you will probably not need 16GB
per machine. Having several nodes per machine would not help since the
bottleneck would be CPU (not disk since everything fits in the FS cache and
not memory since you have much more memory than your index size) and a
single node can already make use of all your CPUs.

For more general considerations about shard sizing, the following chapter
of the reference guide gives practical advice around picking up the right
shard size and number of shards:

On Fri, Feb 27, 2015 at 7:46 PM, Daniel Gligorov gligorov.daniel@gmail.com
wrote:

Hi,

Lets say I have 100 node cluster, each with 16G heap (50% of the RAM). I
have single index, its 1G, and I know it won't be growing much (will never
grow above the heap size).

  1. Is having single shard better than having 5 by default? Then to spread
    the load will run 99 replicas

  2. If index size < heap size, is all that heap memory wasted? I mean lets
    say I have 5G index, would it be better to have 4 nodes with 5G heap or 2
    with 10G heap?

Thank you,

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/be8255a3-28c3-4754-ac17-c94e2b16afaf%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/be8255a3-28c3-4754-ac17-c94e2b16afaf%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6%3DtnJF8spx%3DvSxyNs0otrkNCp%3DhkgZxH4dkrF8MnLWNQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.