Bulk loading performance slow & varies in ES 2.1.0

I am testing with bulk loading and it seems the loading time vary significantly for same batch size and similar document size. There are no CPU, IO or Memory bottlenecks seen. Cluster is with 3 masters, 5 data and 3 client nodes.
Bulk inserts via connection client nodes/haproxy. Average document size 2KB+ and batch size 3000 with 10 concurrent bulk load processes. Not much significant difference in performance with 1 or 0 replicas.
Bulk loader is Python script with random data.

Increased config for : :
threadpool.bulk.queue_size: 5000
threadpool.index.queue_size: 5000

Bad Times:
[I 151210 11:21:45 es_test_data:60] Upload: OK - upload took: 15983ms, total docs uploaded: 753000
[I 151210 11:21:45 es_test_data:60] Upload: OK - upload took: 20327ms, total docs uploaded: 741000
[I 151210 11:21:45 es_test_data:60] Upload: OK - upload took: 13780ms, total docs uploaded: 753000
[I 151210 11:21:45 es_test_data:60] Upload: OK - upload took: 13229ms, total docs uploaded: 870000
[I 151210 11:21:45 es_test_data:60] Upload: OK - upload took: 13812ms, total docs uploaded: 744000
[I 151210 11:21:45 es_test_data:60] Upload: OK - upload took: 19866ms, total docs uploaded: 759000

Good Response:
[I 151210 11:22:30 es_test_data:60] Upload: OK - upload took: 907ms, total docs uploaded: 768000
[I 151210 11:22:30 es_test_data:60] Upload: OK - upload took: 939ms, total docs uploaded: 750000
[I 151210 11:22:30 es_test_data:60] Upload: OK - upload took: 969ms, total docs uploaded: 762000
[I 151210 11:22:30 es_test_data:60] Upload: OK - upload took: 866ms, total docs uploaded: 762000
[I 151210 11:22:30 es_test_data:60] Upload: OK - upload took: 982ms, total docs uploaded: 753000