Structured Streaming ES Sink


(Matthew Jackson) #1

I have been trying to output data from my Spark Structured Streaming job to Elastic Search, following the example here.

I am using Spark 2.3.0 and elasticsearch-hadoop:6.3.1 (specified through --jars in spark submit).

The (python) code I am using (below) contains some additional configuration. Notice .format("org.elasticsearch.spark.sql") as opposed to .format("es"), following the advice here.

query = stream.writeStream \
              .outputMode("append") \
              .format("org.elasticsearch.spark.sql") \
              .option("es.nodes", <address>) \
              .option("es.port", "9200") \
              .option("es.nodes.discovery", "true") \
              .option("es.http.timeout", "20s") \
              .option("es.http.retries", "0") \
              .option("checkpointLocation", "~/checkpoint_es") \
              .start(indexTest/typeTest)

query.awaitTermination()

When I try to run this I am getting the following error.

org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot determine write shards for [structuredTest/testType]; likely its format is incorrect (maybe it contains illegal characters? or all shards failed?)

As far as I am aware this is not the case, but I can not get it working. Has anyone else seen this?


(Matthew Jackson) #2

Update

I realised that the upper case characters were causing the error.


(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.