Disk I/O 100%

I have a 2 node setup (with default configs, ES 2.1), both are set as master and data. Both of them are identical servers (cpu, ram, disk) in ec2 enviroment, the disks are magnetic.
The data is streamed via bulk api (~500msg/s). It all goes well until something goes wrong, after about 24h.
The master that i send data to, has a disk i/o at about 25% (usually it's 8-10%) and the slave is at 100%. Any bulk data that i send when the salve is at 100% is really slow. Basically it cannot send fast enough and the message queue grows too much.

Any ideas why it would go up to 100% or what i could do to increase the throughput ?
Also, i read that replication is sync, so it waits for the replica to ack the data.


Can you share the hot_threads output for those nodes:
We'll need more informations to understand your problem :wink: