I am currently building a large Elasticsearch cluster that needs to be able to eventually handle 1,000,000 index requests per second. I am currently scaling to that point (at about 250k/s) but I are being held back by a large amount of segment merges. When I fire up the data stream, the cluster runs well for about 5 minutes then I start to see throttling due to a large amount of merges.
I want to try to force Elasticsearch to create large initial segments so that it doesn't waste time merging small segments. We have a lot of RAM to work with (32GB for ES) so we can afford to build large segments in-memory.
I thought I could achieve this through the following settings:
indices.memory.index_buffer_size: 30%
index.translog.flush_threshold_size: 5g
index.refresh.interval: 60s
The idea is to allocate a large amount of memory to the Index Writer while setting the refresh interval and flush threshold to be large enough that segments don't get committed very often.
Unfortunately I am still seeing many small segments getting created, resulting in throttling and overall poor performance.
Currently I have 4 active Indices for a total of only 8 shards per node.
Here are a few snapshots of what I'm seeing from marvel.
As you can see my Index Writer Memory is not even close to being fully utilized and my segment count is fairly high.
Any input on what I might be doing wrong or how I can achieve my desired behavior would be greatly appreciated.
Thanks,
Harlin