Spark SQL and standalone config


(Serge Semichev) #1

Working in Databricks environment where sqlContext is already exist, I can use standalone config only, like

var config: Map[String,String] = Map()
config += ("es.nodes" -> "es-node-address")
config += ("es.resource" -> persons/person")

To read data to RDD I can use this code

val persons = sc.esRDD(config)
persons.count

But it cannot be converted to data frame - persons.toDF, due to error "Schema for type scala.AnyRef is not supported".

How can I use my standalone config to load data to data frame? It seems that "load" method of spark.sql.DataFrameReader doesn't have an overloaded version for a custom Map.

import org.elasticsearch.spark.sql._

// DataFrame schema automatically inferred
val df = sqlContext.read.format("es").load("buckethead/albums")

// operations get pushed down and translated at runtime to Elasticsearch QueryDSL
val playlist = df.filter(df("category").equalTo("pikes").and(df("year").geq(2016)))

(system) #2