We're about to stand up an elasticsearch cluster and we're facing the task
of determining the correct number of shards to allocate for our single
index. We have 30 servers (16cpus, 48gb memory), each of which hosting one
node. The concern is only with query performance. Indexing performance is
not important.
In our research on elasticsearch, it seems that one of the biggest for
query performance is being able to fit the entire index in memory.
However, we've also seen where having less shards / more replicas will also
increase query performance.
I see two paths, both assuming a single shard per node:
OPTION 1: allocate 30 primary shards - the shards of the index will be
small enough to fit in OS cache on each node, but every query will have to
hit 30 shards. ( there's no great way to utilize "routing" here. Assume
each query will hit all shards.)
OR
OPTION 2: allocate 10 primary shards, with 2 replicas - the shards of the
index will NOT be small enough to fit in OS cache on each node, but a query
will only have to hit 10 of the shards, distributed among the 30 nodes.
So, generally speaking, which is more important to query performance?
Being able to fit the entire index in OS cache? Or less shards / more
replicas?
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/325f920a-8676-44c4-b272-05381c3078ff%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.