Cant get Spark to actually retrieve data

gzz · June 21, 2017, 2:26pm

Hi,
I've setup hadoop+spark 1.6 via CDH and added the latest Zeppelin. Here I've included the latest es-hadoop binding and am now trying to just load some data from my ES cluster. While it retrieves the mapping and also issues the query, it immediately deletes the scroll id after the query without ever getting any data. Consequently, I end up having the schema but not data in Zeppelin.

I'm really out of ideas here, can anyone help?! Thank you!

Here are some of my queries:

%spark
import org.apache.spark.sql.SQLContext
import org.elasticsearch.spark.sql._

var sql = new org.apache.spark.sql.SQLContext(sc)
sql.esDF("logstash-2017.06.21/logs",Map(
"es.nodes" -> "my.host.name",
"es.read.field.include" -> "host")).registerTempTable("logs")
z.show(sql.sql("select count(host) from logs"))

returns 0

Or even simpler:

%spark
sqlContext.read.format("es").load("logstash-*/logs").limit(10).show()

Returns a nice table with a bunch of columns, but empty

james.baiera · June 21, 2017, 2:35pm

Could you increase the logging of the org.elasticsearch.hadoop.rest.commonshttp package to TRACE and post all the relevant logs here? That should help us track down what the connector is receiving.

system · July 19, 2017, 2:35pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.