How to run ES-Hadoop in Jupyter Notebook (Python or Scala)

Hello there, I 'm following this tut to connect Elasticsearch and Spark:

I use command:

setx PYSPARK_DRIVER_PYTHON ipython
setx PYSPARK_DRIVER_PYTHON_OPTS notebook
pyspark ----driver-class-path /path-to-jar-file...

Everything is fine. But I don't want to connect to Spark server very more. Maybe I can only:

setx PYSPARK_DRIVER_PYTHON ipython
setx PYSPARK_DRIVER_PYTHON_OPTS notebook
setx SPARK_CLASSPATH /path-to-all-jar-files...
pyspark ----driver-class-path SPARK_CLASSPATH

But all are not run. Error:

Py4JJavaError: An error occurred while calling o139.save.
: java.lang.ClassNotFoundException: Failed to find data source: es. Please find packages at Third-Party Projects | Apache Spark
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:635)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:241)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:225)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassNotFoundException: es.DefaultSource
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$15.apply(DataSource.scala:618)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23$$anonfun$apply$15.apply(DataSource.scala:618)
at scala.util.Try$.apply(Try.scala:192)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:618)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$23.apply(DataSource.scala:618)
at scala.util.Try.orElse(Try.scala:84)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:618)
... 13 more

I would make sure that ES-Hadoop is on all of your classpaths. Alternatively, you could try using the full length name for the connector (org.elasticsearch.spark.sql) and give it a shot? I've seen some cases where the short name was unusable

you can specify the path to the jar file while starting pyspark :

pyspark --jars /Path_to _ESHadoop_sparkJar/YourESHadoop_sparkJar.jar

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.