Thanks for the reply! I have a couple follow up questions if you don't mind: What sort of things affect the disk I/O usage? Number of shards you're indexing on, total size of those shards, number of documents being indexed? Intuitively I would guess that each of these have some impact, but is one of these typically the driver of high I/O?
There are definitely spikes in the I/O throughput (can't immediately figure out how to look at iowait at the moment), but they don't seem to explain the spikes in search latency. For example during our nightly job when we are reindexing the entire dataset in the background, the read and write throughput are both much higher and yet we don't experience the same latency spikes. That's not to say that isn't the problem, just that it isn't immediately obvious or could be a related issue