Hi,
I am trying to read from Elasticsearch using Spark SQL and getting the
exception below.
My environment is CDH 5.3 with Spark 1.2.0 and Elasticsearch 1.4.4.
Since Spark SQL is not officially supported on CDH 5.3, I added the Hive
Jars to Spark classpath in compute-classpath.sh.
I also added elasticsearch-hadoop-2.1.0.Beta3.jar to the Spark classpath in
compute-classpath.sh.
Also, I tried adding the Hive, elasticsearch-hadoop and elasticseach-spark
Jars to SPARK_CLASSPATH environment variable prior to running spark-submit,
but got the same exception.
Exception in thread "main" java.lang.RuntimeException: Failed to load class
for data source: org.elasticsearch.spark.sql
at scala.sys.package$.error(package.scala:27)
at org.apache.spark.sql.sources.CreateTableUsing.run(ddl.scala:99)
at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:67)
at
org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:67)
at org.apache.spark.sql.execution.ExecutedCommand.execute(commands.scala:75)
at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:425)
at
org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:425)
at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)
at org.apache.spark.sql.SchemaRDD.(SchemaRDD.scala:108)
at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:303)
at com.informatica.sats.datamgtsrv.Percolator$.main(Percolator.scala:29)
at com.informatica.sats.datamgtsrv.Percolator.main(Percolator.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
The code with I am trying to run:
import org.apache.spark._
import org.apache.spark.sql._
import org.apache.spark.SparkContext._
import org.elasticsearch.spark._
import org.elasticsearch.spark.sql._
object MyTest
{
def main(args: Array[String])
{
val sparkConf = new SparkConf().setAppName("MyTest")
val sc = new SparkContext(sparkConf)
val sqlContext = new SQLContext(sc)
sqlContext.sql("CREATE TEMPORARY TABLE INTERVALS " +
"USING org.elasticsearch.spark.sql " +
"OPTIONS (resource 'events/intervals') " )
val allRDD = sqlContext.sql("SELECT * FROM INTERVALS")
allRDD.foreach(rdd => {rdd.foreach(elem => print(elem + "\n\n"));})
}
}
I checked in Spark source code ( resource
org\apache\spark\sql\sources\ddl.scala ) and saw that the run method
in CreateTableUsing class expects "DefaultSource.class" file for the data
source that needs to be loaded.
However, there is no such class in org.elasticsearch.spark.sql package in
the official Elasticsearch builds.
I checked in following jars:
elasticsearch-spark_2.10-2.1.0.Beta3.jar
elasticsearch-spark_2.10-2.1.0.Beta2.jar
elasticsearch-spark_2.10-2.1.0.Beta1.jar
elasticsearch-hadoop-2.1.0.Beta3.jar
Can you please advise why this problem happens and how to resolve it?
Thanks,
Dmitriy Fingerman
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/12e69eee-6fb1-401e-95f6-5c69341c1796%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.