Structured Streaming ES Sink

mattjackson1989 · July 11, 2018, 7:59am

I have been trying to output data from my Spark Structured Streaming job to Elastic Search, following the example here.

I am using Spark 2.3.0 and elasticsearch-hadoop:6.3.1 (specified through --jars in spark submit).

The (python) code I am using (below) contains some additional configuration. Notice .format("org.elasticsearch.spark.sql") as opposed to .format("es"), following the advice here.

query = stream.writeStream \
              .outputMode("append") \
              .format("org.elasticsearch.spark.sql") \
              .option("es.nodes", <address>) \
              .option("es.port", "9200") \
              .option("es.nodes.discovery", "true") \
              .option("es.http.timeout", "20s") \
              .option("es.http.retries", "0") \
              .option("checkpointLocation", "~/checkpoint_es") \
              .start(indexTest/typeTest)

query.awaitTermination()

When I try to run this I am getting the following error.

org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Cannot determine write shards for [structuredTest/testType]; likely its format is incorrect (maybe it contains illegal characters? or all shards failed?)

As far as I am aware this is not the case, but I can not get it working. Has anyone else seen this?

mattjackson1989 · July 11, 2018, 8:40am

Update

I realised that the upper case characters were causing the error.

system · August 8, 2018, 8:54am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Reading structured streaming data from Elasticsearch into Spark using Python Elasticsearch es-hadoop	3	1846	April 1, 2022
Spark structured streaming Elasticsearch integration issue. Data source es does not support streamed writing Elasticsearch	1	1257	October 24, 2017
Cannot Save Data Frame to Elasticsearch during spark streaming Elasticsearch	2	1420	November 7, 2017
Structured Streaming - "Failed to find data source: es" Elasticsearch es-hadoop	4	6398	January 17, 2018
Spark structured streaming Elasticsearch integration issue Elasticsearch es-hadoop	2	865	July 11, 2019

Structured Streaming ES Sink

Related topics