Optimizing shard size and count per node

I'm going through a cluster sizing exercise and I'm looking for some
pointers.

I have tested the target hardware with a single shard loading production
data and simulating some of our actual queries using Gatling. Our heaviest
queries are a blend of top_children with filters, match_all
with query_string filters and query_string queries
with has_parent query_string filters. I've narrowed down an optimal shard
size based on those tests which should give me the target number of shards
for my index (num_shards = index_size / max_shard_size). Some other notes
are:

  • memory on the box is split 50/50 between JVM heap and OS.
  • the available OS page cache memory can maybe fit two shards worth of
    data
  • I have plenty of room left for field and filter caches
  • heap and GC seem to be behaving nicely with a slow and consistent step
    curve in Bigdesk
  • tests seem to show the box being CPU bound but it's not disk IO:
    iostat does not show any disk reads during the high load times, Bigdesk
    shows most time spent in user space and the highest activity seems to be
    around young generation collections

At this point, what are some of the most important things to consider in
terms of how many of these shards to co-locate on the same node?

  • Would adding more shards on the same box negate the tests I just ran?
  • Do I limit the number of shards to how much can fit in page cache on
    the node? That really blows up the number of nodes I will need which is
    pretty costly. Is this approach really worth it?
  • Since I'm CPU bound, shouldn't I be worried about more cores if I want
    more shards per box?

Any pointers are welcome.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8c2f49e2-1cac-4668-a7ae-b029f2ae6950%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.