Stress testing ES-Hadoop

@costin We bumped up the executor memory and now it works on a standalone cluster, it took almost 90mins to load all the data though.

Now, am running it on Yarn to test and am getting a different error message

ge 0.0 failed 4 times, most recent failure: Lost task 3.3 in stage 0.0 (TID 10,
mavencode.ca): java.lang.IllegalArgumentException: Size exceeds In
teger.MAX_VALUE
        at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:836)
        at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStor
e.scala:125)
        at org.apache.spark.storage.DiskStore$$anonfun$getBytes$2.apply(DiskStor
e.scala:113)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1285)
        at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:127)
        at org.apache.spark.storage.DiskStore.getBytes(DiskStore.scala:134)
        at org.apache.spark.storage.BlockManager.doGetLocal(BlockManager.scala:5
11)
        at org.apache.spark.storage.BlockManager.getLocal(BlockManager.scala:429
)
        at org.apache.spark.storage.BlockManager.get(BlockManager.scala:617)
        at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:44)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:242)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:
35)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:277)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:244)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:
35)

Still pulling my hairs to figure out how to fix this