Adding shards to reduce average size


I have been struggling off-and-on for months now to run an apparently
highly updated ES cluster on as little hardware as possible. There's more
history here,
but the overall issue is that my use case involves lots of frequent
updates, which results in a growing percentage of deleted docs, and thus
more RAM needed to handle the same dataset. I want to run a possible
change in index structure by the group to get any possible recommendations.

Ultimately the growing delete percentage seems to be related to both the
size of the shards in question, and a hard-coded ratio in the Lucene Tiered
Merge Policy code. As it turns out, the merge code will not consider
merging a segment until the non-deleted portion of the segment is smaller
than half of the max_merged_segment parameter (someone please correct me if
I'm wrong here, and I wish it had been easier to discover this). This does
not appear to be configurable in any version of Lucene in which I've
looked. This explains why bumping max_merged_segment stupidly high can
keep the deleted percentage low; the larger segments are considered for
merging when they have a smaller percentage of deleted docs. This also
explains why my shards grew to be ~45% deleted docs.

To work around this, I'm considering reindexing with 3 or 4 times the
number of shards on the same number of nodes to shrink the shards to a more
manageable size, but want to make sure that the overhead from additional
shards will not counter the benefits from having smaller shards. Note that
I AM using routing, which I assume should reduce much of the over-sharding

The index in question now has six shards with one replica each. Shard
sizes ranges from 20-30 GB. Currently this is running on six r3.xlarge
(30.5 GB RAM, 20GB allocated to ES) instances. Ideally I'd be running on
no more than four. With a low enough delete percentage I think I can even
run on three.

My theory is that if I reduce the size of the shards to 5-10 GB max, then I
can set the max_merged_segment high enough that the deleted docs will be
merged out quickly, and I'll be able to handle the size of the merges in
question without issue. With the existing setup, even on six nodes, I
occasionally run into merge memory issues with large max_merged_segment
sizes (13 GB currently I think).

Does this seem reasonable? I'd be looking at 36-48 shards, counting
replicas, being spread across 3-4 nodes, so 12-18 each. Thoughts?

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
To view this discussion on the web visit
For more options, visit