How do I build results dinamically from a Dataframe? (Apache Spark)

yeikel · December 28, 2018, 4:10pm

I have a Dataframe containing a list of cities as following :

val cities    = sc.parallelize(Seq("New York")).toDF()

Now , for each city , I would like to query Elastic and build a set of results similar to the following logic :

val cities    = sc.parallelize(Seq("New York")).toDF()
   cities.foreach(r => {
    val city = r.getString(0)
    val dfs = sqlContext.esDF("cities/docs", "?q=" + city) //returns a DataFrame which triggers the exception 
    })

Problem is that Spark does not allow nested operations that return dataframes. What options do I have to iterate a dataframe and get the results?

yeikel · December 30, 2018, 4:33am

Another option , is it possible to get a regular data structure that is not a dataframe using this connector?

james.baiera · January 2, 2019, 4:48pm

This seems like something better suited to using the regular java rest client to perform the search from within the foreach function. If you decide to do that, I would suggest doing foreachPartition instead of foreach so that you can batch up the query to send to Elasticsearch. Also, that would allow you to tear down the client after all the data in the partition is consumed.

system · January 30, 2019, 4:49pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Saving DF to Elasticsearch usig python Elasticsearch es-hadoop	2	5442	April 8, 2017
ES Aggregations in Spark Elasticsearch es-hadoop	2	2194	July 6, 2017
Error while creating spark dataframe with elasticsearch query Elasticsearch es-hadoop	2	1395	February 16, 2018
Possibility of querying all types in elasticsearch-spark instead of the strict index/type es.resource Elasticsearch es-hadoop	7	1706	July 6, 2017
Can we read data from index into spark by query? Elasticsearch es-hadoop	4	1007	July 6, 2017

How do I build results dinamically from a Dataframe? (Apache Spark)

Related topics