Load data into HDFS using ES-Spark

costin · May 14, 2015, 6:22am

There might be various reasons why the saveAsTextFile takes a long time - typically it might be because the parallelism is small (there's only one task handling it) or because the there's a large number of values (sometimes all) under the same key.
What does you RDD looks like - any information on Spark during the wait and what it is doing? What's your hardware?

As for es-hadoop, in a nutshell it's a connector between Elasticsearch and Hadoop so it likely fits the latter description.
es-hadoop itself doesn't store any state, rather it helps data move between Elastic and Hadoop.

Topic		Replies	Views
Load data into HDFS using ES-Spark Elasticsearch	2	574	July 6, 2017
How should I search data in hdfs Elasticsearch es-hadoop	3	1875	July 6, 2017
Save and search data with es & hadoop Elasticsearch es-hadoop	4	1240	July 6, 2017
Slow Performance of Elastic Search with Spark Elasticsearch es-hadoop	4	1535	July 29, 2021
Ingesting data from HDFS to ElasticSearch Elasticsearch	3	3719	February 15, 2017

Load data into HDFS using ES-Spark

Related topics