I have been trying to output data from my Spark Structured Streaming job to Elastic Search, following the example here.
I am using Spark 2.3.0 and elasticsearch-hadoop:6.3.1 (specified through --jars in spark submit).
The (python) code I am using (below) contains some additional configuration. Notice
.format("org.elasticsearch.spark.sql") as opposed to
.format("es"), following the advice here.
query = stream.writeStream \ .outputMode("append") \ .format("org.elasticsearch.spark.sql") \ .option("es.nodes", <address>) \ .option("es.port", "9200") \ .option("es.nodes.discovery", "true") \ .option("es.http.timeout", "20s") \ .option("es.http.retries", "0") \ .option("checkpointLocation", "~/checkpoint_es") \ .start(indexTest/typeTest) query.awaitTermination()
When I try to run this I am getting the following error.
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot determine write shards for [structuredTest/testType]; likely its format is incorrect (maybe it contains illegal characters? or all shards failed?)
As far as I am aware this is not the case, but I can not get it working. Has anyone else seen this?