Just to add to the details, the same settings were working fine earlier. Recently we enabled doc_values on all properties and observed a 1.5-2 times increase in disk utilization. Could that have anything to do with increased merge activity?
Your merges are falling too far behind, and so ES throttles incoming indexing to one thread to let them catch up.
The "maxNumMerges=6" in the logged INFO comes from 2 + index.merge.scheduler.max_thread_count. It's the total allowed merge backlog before index throttling will kick in.
I think you should first try removing all settings, so ES defaults apply, except for "indices.store.throttle.type: none" (so that store IO throttling is disabled). Then see if you still hit index throttling ...
And don't call optimize, unless the index will not be updated again (e.g. time-based indices).
After removing these configurations, we did not notice any "now throttling indexing" messages in logs. Thanks for the inputs.
But we did end up in a "No space left on disk" errors after ~36 hours which I'm thinking could be related to enabling doc_values. With doc_values enabled, we observed a huge increase in the amount of disk utilized (nearly twice). Our documents have a ttl of 24 hours. So after 24 hours space should have got continuously reclaimed. Do you see any reasons why the space is not getting reclaimed or may be merging requires additional space to complete.
After around 24 hours, there was around 10G+10G (2 drives) free on each node. But how would we end up with 100% disk utilization on 1-2 nodes if data was continuously getting removed over the next 12 hours of merging activity?
Doc values inherently consume disk space ... this is the tradeoff vs field data (which consumes java heap).
But, do you have very sparse fields? Or, many different types where each type has different fields? The storage format for doc values is not sparse, so this can consume more disk space than you expect ...
Optimize, especially if you ask it to merge down to a single segment, is going to create huge segments which cause "interesting" tradeoffs later on if you keep writing to the index - especially if you delete. Your best bet is never to call optimize unless you are done writing.
Basically updates and deletes have eventually have to rewrite chunks of your index to reclaim space. Optimize makes the chunks much larger.
Just chiming in to say that if you can avoid TTL, you'll greatly reduce your merge pressure.
TTL works by (essentially) running a query every 60s and finding all docs that have expired, then executing individual deletes against those documents. These deleted docs linger in your segments until Lucene's merge scheduler decides to merge them out.
Basically, TTL pokes a lot of little holes in all of your segments, which causes the merge scheduler to constantly be cleaning up all the half-filled segments. Which ultimately means you are moving a lot of data around the disk all the time.
If, instead, you can structure your indices using a time-based approach (e.g. index-per-day), you can simply delete the entire index. This is equivalent to deleting a directory off the disk, and doesn't require any expensive merging.
Usually the time-based index doesn't provide a fine enough granularity for your application, so you'll likely want to include an expire_time field in the document and a corresponding range filter in your query, to make sure docs are no longer served after the 24hr period (but before the index is deleted).
IIRC, the one legitimate use-case for TTL is something like an auction where the expiration is dynamic. Some auctions like to extend the time after a bid for example (to prevent sniping), so a strict field expiration + delete-index approach wouldn't work because, theoretically, an auction could extend indefinitely if people keep bidding.
There are probably a few other rare edge-cases, but that's the one that comes to mind.
That said, it might be nice if we could perhaps provide an "efficientTTL" which helps manage the non-TTL backed approach. Not sure how it'd look on the query end...a new "expired" filter? A special expired date expression, so you could say "gte" : "expired" which is just shorthand for now - <defined retention period>?
You'd index documents based on their expiry date and store it with the document - so that you could smash the whole index after a while and know "everything in there was expired anyway". And you'd use an alias/alias-ish thing to add a simple date filer I think.
I think if you know when the document will expire up front that is really the way to go for TTL like stuff. But you can build that on the client side.
That doesn't solve the problem where you don't know the expiration up front because changing it would require removing the document from one index and dropping it into another. Which is problematic because refresh times don't line up. I bet someone sufficiently motivated could make TTL more efficient in the single index case by being sneaky with the merge scheduler - never merging segments containing TTLs off by more than an hour, letting segments get very delete-ful without merging if it knows the whole thing will be past its TTL soon. Its fun to think about but it'd be a bunch of work.