I have been trying to output data from my Spark Structured Streaming job to Elastic Search, following the example here.
I am using Spark 2.3.0 and elasticsearch-hadoop:6.3.1 (specified through --jars in spark submit).
The (python) code I am using (below) contains some additional configuration. Notice .format("org.elasticsearch.spark.sql")
as opposed to .format("es")
, following the advice here.
query = stream.writeStream \
.outputMode("append") \
.format("org.elasticsearch.spark.sql") \
.option("es.nodes", <address>) \
.option("es.port", "9200") \
.option("es.nodes.discovery", "true") \
.option("es.http.timeout", "20s") \
.option("es.http.retries", "0") \
.option("checkpointLocation", "~/checkpoint_es") \
.start(indexTest/typeTest)
query.awaitTermination()
When I try to run this I am getting the following error.
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot determine write shards for [structuredTest/testType]; likely its format is incorrect (maybe it contains illegal characters? or all shards failed?)
As far as I am aware this is not the case, but I can not get it working. Has anyone else seen this?