We are running Databricks 7.6 (includes Apache Spark 3.0.1, Scala 2.12)
and istalled via jip install org.elasticsearch:elasticsearch-spark-30_2.12:7.12.1 and now I am getting this error java.lang.ClassNotFoundException: Failed to find data source: org.elasticsearch.spark.sql.
@mlo Can you run a test that utilizes Spark 3.0 classes outside of spark.sql?
> import org.elasticsearch.spark._
> val conf = ...
> val sc = new SparkContext(conf)
> sc.esRDD("radio/artists", "?q=me*")
Also, post your class here for review
AND
jip list
java -version
And elasticsearch version ( you posted here 7.1?)
ls on the elasticsearch home dir
ls
Hey Matt. Using Elastic 7.11
Also I ran this as you suggested and it seemed to work without error.
%scala
import org.apache.spark.SparkConf
import org.elasticsearch.spark._
val sc = SparkContext.getOrCreate()
sc.esRDD("radio/artists", "?q=me*")
Result:
import org.apache.spark.SparkConf import org.elasticsearch.spark._ sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@6b1cbda7 res2: org.apache.spark.rdd.RDD[(String, scala.collection.Map[String,AnyRef])] = ScalaEsRDD[44] at RDD at AbstractEsRDD.scala:41
Also
%sh
java -version
openjdk version "1.8.0_282" OpenJDK Runtime Environment (Zulu 8.52.0.23-CA-linux64) (build 1.8.0_282-b08) OpenJDK 64-Bit Server VM (Zulu 8.52.0.23-CA-linux64) (build 25.282-b08, mixed mode)
jip list returns nothing
I put together the project and went with Java, instead of Scala - again just testing out different environments - and I got the import org.elasticsearch.spark.sql.api.java.JavaEsSparkSQL; -- class to execute correctly.
POM.xml
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-spark-30_2.12</artifactId>
<version>7.13.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.12</artifactId>
<version>3.0.2</version>
<!-- <scope>provided</scope> -->
</dependency>
Either the packaged jar needs to contain both spark sql*.jar and elasticsearch-spark*.jar
or install both jars on all clients that are executing the class.
Class --
package hello;
import java.util.Map;
import org.apache.commons.lang.StringUtils;
import org.joda.time.LocalTime;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.SparkConf;
//import org.apache.spark.sql.api.java.JavaSQLContext;
import org.apache.spark.sql.SQLContext;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import static org.elasticsearch.spark.rdd.api.java.JavaEsSpark.*;
import org.elasticsearch.spark.sql.api.java.JavaEsSparkSQL;
public class HelloWorld {
public static void main(String[] args) {
SparkConf conf = new SparkConf().setAppName("AppName").setMaster("local");
conf.set("es.index.auto.create", "true");
conf.set("es.nodes.wan.only", "true");
conf.set("es.nodes", "https://******.eastus2.azure.elastic-cloud.com");
conf.set("es.port", "9243");
conf.set("es.net.http.auth.user", "********");
conf.set("es.net.http.auth.pass", "*******");
JavaSparkContext jsc = new JavaSparkContext(conf);
// JavaRDD<Map<String, Object>> esRDD =
// esRDD(jsc, "filebeat-*/_doc", "?q=test").values();
// System.out.println(StringUtils.join(esRDD.collect(), ","));
SQLContext sql = new SQLContext(jsc);
//test-message is a very simple index on my cloud cluster
Dataset<Row> people = JavaEsSparkSQL.esDF(sql, "test-message");
System.out.println("String: " + people.first().toString());
}
}
This is working in local mode.
I noticed - or got hung up on DataFrame moved to Dataset - but that was spark-related not classpath related to org.elasticsearch.spark.sql.
Spark psuedo code -->
import org.apache.spark.sql.SQLContext
import org.elasticsearch.spark.sql._
val conf = (example in java all the things to set...)
val sc = new SparkContext(conf)
val sql = new SQLContext(sc)
val people = sql.esDF("spark/people")
This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.