Error in Databricks 7.6 and Elastic 7.1.1 **java.lang.ClassNotFoundException: Failed to find data source: org.elasticsearch.spark.sql.**

We are running Databricks 7.6 (includes Apache Spark 3.0.1, Scala 2.12)
and istalled via jip install org.elasticsearch:elasticsearch-spark-30_2.12:7.12.1 and now I am getting this error java.lang.ClassNotFoundException: Failed to find data source: org.elasticsearch.spark.sql.

@mlo Can you run a test that utilizes Spark 3.0 classes outside of spark.sql?

>     import org.elasticsearch.spark._
>     val conf = ...
>     val sc = new SparkContext(conf)
>     sc.esRDD("radio/artists", "?q=me*")

Also, post your class here for review
AND

jip list
java -version

And elasticsearch version ( you posted here 7.1?)
ls on the elasticsearch home dir

ls

Hey Matt. Using Elastic 7.11

Also I ran this as you suggested and it seemed to work without error.

%scala
import org.apache.spark.SparkConf
import org.elasticsearch.spark._
val sc = SparkContext.getOrCreate()
sc.esRDD("radio/artists", "?q=me*")

Result:

import org.apache.spark.SparkConf import org.elasticsearch.spark._ sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@6b1cbda7 res2: org.apache.spark.rdd.RDD[(String, scala.collection.Map[String,AnyRef])] = ScalaEsRDD[44] at RDD at AbstractEsRDD.scala:41

Also

%sh
java -version

openjdk version "1.8.0_282" OpenJDK Runtime Environment (Zulu 8.52.0.23-CA-linux64) (build 1.8.0_282-b08) OpenJDK 64-Bit Server VM (Zulu 8.52.0.23-CA-linux64) (build 25.282-b08, mixed mode)

jip list returns nothing

I put together the project and went with Java, instead of Scala - again just testing out different environments - and I got the import org.elasticsearch.spark.sql.api.java.JavaEsSparkSQL; -- class to execute correctly.

POM.xml

 <dependency>
                          <groupId>org.elasticsearch</groupId>
                           <artifactId>elasticsearch-spark-30_2.12</artifactId>
                           <version>7.13.2</version>
         </dependency>

         <dependency>  
                            <groupId>org.apache.spark</groupId>
                            <artifactId>spark-sql_2.12</artifactId>
                            <version>3.0.2</version>
                            <!-- <scope>provided</scope> -->
         </dependency>

Either the packaged jar needs to contain both spark sql*.jar and elasticsearch-spark*.jar
or install both jars on all clients that are executing the class.

Class --

package hello;

    import java.util.Map;
    import org.apache.commons.lang.StringUtils;
    import org.joda.time.LocalTime;

    import org.apache.spark.api.java.JavaSparkContext;
    import org.apache.spark.api.java.JavaRDD;
    import org.apache.spark.SparkConf;

    //import org.apache.spark.sql.api.java.JavaSQLContext;
    import org.apache.spark.sql.SQLContext;

    import org.apache.spark.sql.Dataset;
    import org.apache.spark.sql.Row;

    import static org.elasticsearch.spark.rdd.api.java.JavaEsSpark.*;
    import org.elasticsearch.spark.sql.api.java.JavaEsSparkSQL;

    public class HelloWorld {
            public static void main(String[] args) {
                   
                    SparkConf conf = new SparkConf().setAppName("AppName").setMaster("local");
                    conf.set("es.index.auto.create", "true");
                    conf.set("es.nodes.wan.only", "true");
                    conf.set("es.nodes", "https://******.eastus2.azure.elastic-cloud.com");
                    conf.set("es.port", "9243");
                    conf.set("es.net.http.auth.user", "********");
                    conf.set("es.net.http.auth.pass", "*******");

                    JavaSparkContext jsc = new JavaSparkContext(conf);

    //              JavaRDD<Map<String, Object>> esRDD =
    //                        esRDD(jsc, "filebeat-*/_doc", "?q=test").values();

    //               System.out.println(StringUtils.join(esRDD.collect(), ","));

                    SQLContext sql = new SQLContext(jsc);
                    
                   //test-message is a very simple index on my cloud cluster
                    Dataset<Row> people = JavaEsSparkSQL.esDF(sql, "test-message");
                    System.out.println("String: " + people.first().toString());

 

            }
    }

This is working in local mode.

I noticed - or got hung up on DataFrame moved to Dataset - but that was spark-related not classpath related to org.elasticsearch.spark.sql.

Spark psuedo code -->

import org.apache.spark.sql.SQLContext        
import org.elasticsearch.spark.sql._          
val conf = (example in java all the things to set...)
val sc = new SparkContext(conf)
val sql = new SQLContext(sc)
val people = sql.esDF("spark/people")

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.