Index throughput issues - tried all tuning suggestions posted

Hi Kimchy. I have read quite a few posts from you on tuning the indexing in elastic search cluster for initial bulk loads. I have a 4 nodes cluster with 8G per node allocated to my elastic node (also these are VMs with centos and 4 core CPUs). I have tried increasing the shards, configuring the replications to 0, refresh interval to -1, index concurrency from 8 -16 and 32 and I use CB plugin to push data into ES cluster. Unfortunately what ever I do I still see a throughput of 2k records per second. I have 57 million docs to index (each 2k size with probably 200 fields per doc). At that rate it takes around 16-24 hours. But I want to cut this to 1-2 hours.

Please suggest me on how I can get this throughput. Appreciate any help regarding this.