Thanks for responding and providing the link to that thread. Much appreciated!
In this situation, I've decided to resort to using Redis to hold values temporarily (for X minutes) for the key in the documents that get updated. There is a trade-off here between having real-time up to the second accurate values for this specific field but hammering the storage system and causing high I/O with constant updates or holding these fields in Redis and flushing out the information back into Elasticsearch periodically.
Basically, my use-case is that I'm storing Reddit submission data and there is a field called "num_comments" that signifies how many comments were made to that submission. Previously, every time a comment came in, I would update the corresponding submission document by incrementing the num_comments field by 1. In order to reduce I/O and the frequency of updates, I use Redis to hold the submission id and increment the num_comments field in Redis and then flush out every 5 minutes. So in that span of five minutes, if a submission had 40 new comments, the old method would require 40 updates of incrementing by 1 to the submission document within Elasticsearch. Now I will flush out every 5 minutes and make one update to the submission document and increment the field by 40.
So for other developers facing similar concerns related to this issue, you can reduce I/O contention and the number of updates to documents by using a service like Redis to cache the increments until you want to flush out those changes.
So like everything else in life, there is a compromise made and in this case, I've chosen to reduce the "real-time"ness of the field and save IO and reduce write amplification. This is a strategy that works well if you can afford to have "near real-time" data.
In fact, you could layer Redis on top of the Elasticsearch data by including the data that is held in cache to supplement the data returned by Elasticsearch, however you lose the ability to accurately search on those fields until you flush.