I'm going through a cluster sizing exercise and I'm looking for some
I have tested the target hardware with a single shard loading production
data and simulating some of our actual queries using Gatling. Our heaviest
queries are a blend of top_children with filters, match_all
with query_string filters and query_string queries
with has_parent query_string filters. I've narrowed down an optimal shard
size based on those tests which should give me the target number of shards
for my index (num_shards = index_size / max_shard_size). Some other notes
- memory on the box is split 50/50 between JVM heap and OS.
- the available OS page cache memory can maybe fit two shards worth of
- I have plenty of room left for field and filter caches
- heap and GC seem to be behaving nicely with a slow and consistent step
curve in Bigdesk
- tests seem to show the box being CPU bound but it's not disk IO:
iostat does not show any disk reads during the high load times, Bigdesk
shows most time spent in user space and the highest activity seems to be
around young generation collections
At this point, what are some of the most important things to consider in
terms of how many of these shards to co-locate on the same node?
- Would adding more shards on the same box negate the tests I just ran?
- Do I limit the number of shards to how much can fit in page cache on
the node? That really blows up the number of nodes I will need which is
pretty costly. Is this approach really worth it?
- Since I'm CPU bound, shouldn't I be worried about more cores if I want
more shards per box?
Any pointers are welcome.
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to email@example.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8c2f49e2-1cac-4668-a7ae-b029f2ae6950%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.