The indexing rate drops to 0 at around the same time the packet drop decreases to 0 (about 30K packets received at this time). Indexing rate stays at 0 for about 20-40 seconds before increasing, while packet drop remains at 0 for about 2 more min.
Sounds like potentially garbage collection doing something here. How many CPU cores does Elasticsearch have access to?
Were you ever able to check if your servers have a lot of wait time on IO (the SATA SSDs)?
What happens if you increase shards per index from 2 to 4? You can have multiple shards from the same index on the same node and get some additional performance (if you are disk IO limited).
The merge times I think might be misleading, you might have gone from infrequent long merges to more frequent shorter merges, but (I didn't do any real math here), overall you probably have similar average merge time. Long merge times can also be an indicator of disk IO limits (as to my understanding) they rely heavily on read/writes.