How to construct Spark DStream from continued RDD？

Kramer_Li · March 15, 2016, 2:54am

I`m reading data from ElasticSearch to Spark every 5min. So there will be a RDD every 5 minutes.

I hope to construct a DStream based on these RDDs, so that I can get report for data within last 1 day, last 1 hour , last 5 minutes and so on.

To construct the DStream, I was thinking about create my own receiver, but the official documents of spark only give information using scala or java to do so. And I use python.

So do you know any way to do it? I know we can. After all the DStream is a series of RDDs, of course we should be about create DStream from continued RDDs. I just do not know how. Please give some advice

apache elasticsearch

costin · March 21, 2016, 10:17am

I'm afraid I can't help when it comes to Python. ES-Hadoop is based on the JVM - Spark python integration can leverage the InputFormat/OutputFormat but that's not enough when it comes to an RDD in terms of efficiency.
There is however a community wrapper in python around ES-Hadoop that is available on github; maybe that one will address your problem.

Pardon?

Topic		Replies	Views
How to send Dstream to Elasticsearch? Elasticsearch	1	679	July 6, 2017
Elasticsearch-hadoop pyspark [Hadoop] Elasticsearch	1	435	July 6, 2017
Spark Structured Streaming Sink and Elasticsearch Elasticsearch es-hadoop	1	950	December 19, 2016
How to create a RDD from ElasticSearch using DSL Elasticsearch es-hadoop	2	919	July 6, 2017
Continuously streaming documents out of ElasticSearch using Spark Streaming Elasticsearch	2	2373	July 5, 2017

How to construct Spark DStream from continued RDD？

Related topics