I would rephrase this. Within the same RDD/DataFrame and thus within the same Spark task you can't read and write data at the same time.
You can however use a temporary storage to stream the data between them.
We would use Elasticsearch as datasource for a BI application.
In this use case, it would be a nice feature if we could specify es.node IP via "read..option" (to handle multiple ES conf and switch beetween them on the fly).
We have just one other problem with array type and nested schema (as mentionned here : Spark-sql does not seem to read from a nested schema) to be fully compatible with Elasticsearch as datasource for our BI application.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.