ESHadoop - Hadoop vs Spark

Pat_Humphreys · April 15, 2016, 2:30pm

I have a current Hadoop job running on AWS EMR, running ESHadoop with Cascading. It does bulk inserts of 10,000 4k records about 300M of them.
I was wondering would there be any speed benefits of using Spark instead?

jhendric98 · April 18, 2016, 7:59pm

Without hearing more about your job, I'll have to relate my general experience. We've found Spark to reduce runtimes on jobs over traditional MR in indexing to Elasticsearch. I am not an Elasticsearch expert but it seems data locality may play a part. We used a custom jar loader in a YARN job to load data and have replaced ours with the ES-Hadoop Spark library.

costin · April 21, 2016, 6:08am

A big advantage that Spark SQL gives over other libraries, it that it allows push down - that is in Spark SQL the operations executed can be detected and thus pushed down by 3rd party plugins (like ES-Hadoop). This significantly reduces the amount of data that needs to be pulled in from ES.

Topic		Replies	Views
Perofrmance problem on es-hadoop + spark Elasticsearch es-hadoop	5	1303	July 6, 2017
Elasticsearch-Hadoop Data Locality Elasticsearch	2	959	July 6, 2017
Is it beneficial to use ES-Hadoop over ES when you are not an hadoop user already? Elasticsearch	5	956	July 5, 2017
Using Pig/Spark on ElasticSearch (as External Storage) Elasticsearch	3	442	July 6, 2017
[Hadoop] Slow performance of Elasticsearch-Hadoop + Spark SQL Elasticsearch	2	1011	July 6, 2017

ESHadoop - Hadoop vs Spark

Related topics