Error in Databricks 7.6 and Elastic 7.1.1 **java.lang.ClassNotFoundException: Failed to find data source: org.elasticsearch.spark.sql.**

We are running Databricks 7.6 (includes Apache Spark 3.0.1, Scala 2.12)
and istalled via jip install org.elasticsearch:elasticsearch-spark-30_2.12:7.12.1 and now I am getting this error java.lang.ClassNotFoundException: Failed to find data source: org.elasticsearch.spark.sql.

@mlo Can you run a test that utilizes Spark 3.0 classes outside of spark.sql?

>     import org.elasticsearch.spark._
>     val conf = ...
>     val sc = new SparkContext(conf)
>     sc.esRDD("radio/artists", "?q=me*")

Also, post your class here for review

jip list
java -version

And elasticsearch version ( you posted here 7.1?)
ls on the elasticsearch home dir


Hey Matt. Using Elastic 7.11

Also I ran this as you suggested and it seemed to work without error.

import org.apache.spark.SparkConf
import org.elasticsearch.spark._
val sc = SparkContext.getOrCreate()
sc.esRDD("radio/artists", "?q=me*")


import org.apache.spark.SparkConf import org.elasticsearch.spark._ sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@6b1cbda7 res2: org.apache.spark.rdd.RDD[(String, scala.collection.Map[String,AnyRef])] = ScalaEsRDD[44] at RDD at AbstractEsRDD.scala:41


java -version

openjdk version "1.8.0_282" OpenJDK Runtime Environment (Zulu (build 1.8.0_282-b08) OpenJDK 64-Bit Server VM (Zulu (build 25.282-b08, mixed mode)

jip list returns nothing

I put together the project and went with Java, instead of Scala - again just testing out different environments - and I got the import; -- class to execute correctly.



                            <!-- <scope>provided</scope> -->

Either the packaged jar needs to contain both spark sql*.jar and elasticsearch-spark*.jar
or install both jars on all clients that are executing the class.

Class --

package hello;

    import java.util.Map;
    import org.apache.commons.lang.StringUtils;
    import org.joda.time.LocalTime;

    import org.apache.spark.SparkConf;

    import org.apache.spark.sql.SQLContext;

    import org.apache.spark.sql.Dataset;
    import org.apache.spark.sql.Row;

    import static*;

    public class HelloWorld {
            public static void main(String[] args) {
                    SparkConf conf = new SparkConf().setAppName("AppName").setMaster("local");
                    conf.set("", "true");
                    conf.set("es.nodes.wan.only", "true");
                    conf.set("es.nodes", "https://******");
                    conf.set("es.port", "9243");
                    conf.set("", "********");
                    conf.set("", "*******");

                    JavaSparkContext jsc = new JavaSparkContext(conf);

    //              JavaRDD<Map<String, Object>> esRDD =
    //                        esRDD(jsc, "filebeat-*/_doc", "?q=test").values();

    //               System.out.println(StringUtils.join(esRDD.collect(), ","));

                    SQLContext sql = new SQLContext(jsc);
                   //test-message is a very simple index on my cloud cluster
                    Dataset<Row> people = JavaEsSparkSQL.esDF(sql, "test-message");
                    System.out.println("String: " + people.first().toString());



This is working in local mode.

I noticed - or got hung up on DataFrame moved to Dataset - but that was spark-related not classpath related to org.elasticsearch.spark.sql.

Spark psuedo code -->

import org.apache.spark.sql.SQLContext        
import org.elasticsearch.spark.sql._          
val conf = (example in java all the things to set...)
val sc = new SparkContext(conf)
val sql = new SQLContext(sc)
val people = sql.esDF("spark/people")

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.