Optimizing shard size and count per node

George_Stathis · February 18, 2014, 5:01am

I'm going through a cluster sizing exercise and I'm looking for some
pointers.

I have tested the target hardware with a single shard loading production
data and simulating some of our actual queries using Gatling. Our heaviest
queries are a blend of top_children with filters, match_all
with query_string filters and query_string queries
with has_parent query_string filters. I've narrowed down an optimal shard
size based on those tests which should give me the target number of shards
for my index (num_shards = index_size / max_shard_size). Some other notes
are:

memory on the box is split 50/50 between JVM heap and OS.
the available OS page cache memory can maybe fit two shards worth of
data
I have plenty of room left for field and filter caches
heap and GC seem to be behaving nicely with a slow and consistent step
curve in Bigdesk
tests seem to show the box being CPU bound but it's not disk IO:
iostat does not show any disk reads during the high load times, Bigdesk
shows most time spent in user space and the highest activity seems to be
around young generation collections

At this point, what are some of the most important things to consider in
terms of how many of these shards to co-locate on the same node?

Would adding more shards on the same box negate the tests I just ran?
Do I limit the number of shards to how much can fit in page cache on
the node? That really blows up the number of nodes I will need which is
pretty costly. Is this approach really worth it?
Since I'm CPU bound, shouldn't I be worried about more cores if I want
more shards per box?

Any pointers are welcome.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8c2f49e2-1cac-4668-a7ae-b029f2ae6950%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Trying to optimize Elasticsearch cluster Elasticsearch	3	1024	February 20, 2017
Relationship between nodes count, shard count and shard size Elasticsearch	7	1691	July 5, 2017
Optimal Number of Shards Elasticsearch	2	450	July 6, 2017
Set the number_of_shards Elasticsearch	2	349	July 6, 2017
Number of shards per node Elasticsearch	2	383	July 6, 2017

Optimizing shard size and count per node

Related topics