Need to test a index with terabytes of data, how can I do

We want to test sending 2T of specific data to an index to see how large the cluster can reach.Now I have solved this problem, but I had a puzzle during the testing.

first question: Does the final result of throughput include replicas? I think the result was only primary shard and it seems according to samples to culculate: sum(bulk_size)/time_period. Does this right?
I have read How Write throughput is calculated in Rally - #2 by dliappis

second question: When the index_append start, i use iostat to monitor the io, sometimes the readbyte is zero, why? the bulk_indexing_client_num is 64.

third question: How can I know rally has start the amount of clients that I set?