There might be various reasons why the saveAsTextFile
takes a long time - typically it might be because the parallelism is small (there's only one task handling it) or because the there's a large number of values (sometimes all) under the same key.
What does you RDD looks like - any information on Spark during the wait and what it is doing? What's your hardware?
As for es-hadoop, in a nutshell it's a connector between Elasticsearch and Hadoop so it likely fits the latter description.
es-hadoop itself doesn't store any state, rather it helps data move between Elastic and Hadoop.