Elasticsearch on aws ec2 index very slow

Set up elasticsearch 2.1 3 nodes 5 shards cluster in ec2 aws using m4.large (8G) instance. I am bulk indexing using java transport client . The same index run in my local mac takes about 6 minutes to index 500K documents but in aws ec2, it takes about 40 minutes. I checked cpu, jvm, threadpool, gc, merge, all looks all good on the ec2 instance.

I don't have a ton of knowledge about the storage in ec2. I am using gp2 with 120 iops. I timed the app, the time is mainly spending on writing in index server. Is this gp2 too slow?

So the issue is either on networking from my appserver to elasticsearch server or the inside of the elasticsearch. I set the replicas to 0 now to simple the issue. refresh_interval": "30s", translog.durablity is async. Flush.interval is 30s.

Anybody has the same issue before? I was wondering if this is the provision issue or my configuration on elasticsearch side.

host ip heap.percent ram.percent load node.role master name
10.40.38.111 10.40.38.111 45 65 0.00 d m ip-10-40-38-111
10.40.37.36 10.40.37.36 9 67 0.00 d * ip-10-40-37-36
10.40.38.213 10.40.38.213 65 67 0.01 d m ip-10-40-38-213

That seems pretty slow to me.

thanks. Based on the limit aws doc I have read, the iops is based on the size of the storage. I set up a 40G gp2, and it shows:
vol-6f381592 40 GiB gp2 120 / 3000 snap-4e3c6d2b December 28, 2015 at 2:18:06 PM UTC-8 us-east-1b. I am not sure if there is a statistics on this.