Reading structured streaming data from Elasticsearch into Spark using Python

I believe Elasticsearch supports Spark Structured Streaming since version 6. I have seen examples of WRITING structured streaming data from spark to Elasticsearch in append mode. But I have not seen a single example of reading from Elasticsearch.

For example, here is how you write to ES:

query = df.writeStream \
.outputMode("append") \
.queryName("writing_to_es") \
.format("org.elasticsearch.spark.sql") \
.option("checkpointLocation", "/tmp/") \
.option("es.resource", "index/type") \
.option("es.nodes", "localhost") \
.start()

query.awaitTermination()

But how do you read from ES using spark structured streaming ? Please share an example .....

Thanks,
Saurab

Hi @saurab. As you mention, es-spark does support writing to Elasticsearch with spark structured streaming. But it does not support reading from Elasticsearch as a spark structured streaming source.

Here is the ticket for the work -- Support Spark Structured Streaming read from ES · Issue #1227 · elastic/elasticsearch-hadoop · GitHub. The problem is that we currently don't have an efficient way to listen for changes to an Elasticsearch index.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.