I have a current Hadoop job running on AWS EMR, running ESHadoop with Cascading. It does bulk inserts of 10,000 4k records about 300M of them.
I was wondering would there be any speed benefits of using Spark instead?
Without hearing more about your job, I'll have to relate my general experience. We've found Spark to reduce runtimes on jobs over traditional MR in indexing to Elasticsearch. I am not an Elasticsearch expert but it seems data locality may play a part. We used a custom jar loader in a YARN job to load data and have replaced ours with the ES-Hadoop Spark library.
A big advantage that Spark SQL gives over other libraries, it that it allows push down - that is in Spark SQL the operations executed can be detected and thus pushed down by 3rd party plugins (like ES-Hadoop). This significantly reduces the amount of data that needs to be pulled in from ES.