Hi,
Yes, that's correct. If the data set is completely cached then you shouldn't see a difference. I just wanted to point out that a spinning disk might be problematic if you use multiple clients.
Based on what you describe I still wonder whether lock contention is causing your bottleneck. I still think that the best option to spot this, is to attach a profiler to Elasticsearch while you are running the benchmark. Then this becomes pretty evident when you look at the thread states of the bulk indexing threads.
Daniel