Parallel Iterations of Bulk

When you use a Jinja2 for loop there will be the same (small) delay between tasks to the delay you see between executing explicitly defined operations in a regular track.

Tasks inside the parallel element can have their own independent clients and target-throughput and are independent. There is a large number of examples about the parallel element in this part of the documentation that I suggest you take a look at.

This sounds normal. target-throughput is not a property that will "accelerate" the execution of bulk by automatically increasing clients; if you've specified 1 client, it will stick to that to achieve the specified target-throughput with 1 client. If target-throughput is smaller that what 1 client can achieve, Rally will pause the schedule as required to honor this, but it won't automatically increase clients to achieve larger throughputs. You need to scale the number of clients/bulk size yourself (and read up on sizing Elasticsearch) if the target-throughput can't be achieved.