after running this I'm having below error:
Py4JJavaError: An error occurred while calling o43.load.
: java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.elasticsearch.spark.sql.DefaultSource15 could not be instantiated
at java.base/java.util.ServiceLoader.fail(ServiceLoader.java:586)
at java.base/java.util.ServiceLoader$ProviderImpl.newInstance(ServiceLoader.java:813)
at java.base/java.util.ServiceLoader$ProviderImpl.get(ServiceLoader.java:729)
at java.base/java.util.ServiceLoader$3.next(ServiceLoader.java:1403)
at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:46)
at scala.collection.Iterator.foreach(Iterator.scala:943)
at scala.collection.Iterator.foreach$(Iterator.scala:943)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
at scala.collection.IterableLike.foreach(IterableLike.scala:74)
at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:303)
at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:29
As you can see in my code I have already add dependency still I'm having issue.
Also, for spark 3.3.0 you'll want to use either elasticsearch-spark-30_2.12-8.6.1.jar or elasticsearch-spark-30_2.13-8.6.1.jar. That 20 in your version is referring to spark 2.x, and the 2.11 is referring to scala 2.11 (Spark 3.3.0 uses scala 2.12 or 2.13 I believe). That's not the cause of your current problem, but it will be the next problem -- spark and scala aren't generally compatible across versions.
Thanks for responding as you can see in my post I'm using Jupiter notebook and I'm creating spark session, posted code snippet.
Is there any good documentation on end to end process for connectivity between pyspark and elasticsearch specially on local machine (single node cluster) ?
You didn't mention how you're starting jupyter, but you probably just need to pass that --jars /home/elastic/elasticsearch-spark-30_2.12-8.6.0.jar piece to Jupyter notebook when you're starting it up, probably via the PYSPARK_DRIVER_PYTHON_OPTS environment variable. The exact way to do that might be better answered by the Jupyter notebook community though -- I've only used es-spark directly through pyspark personally.
I have added required dependencies however now I'm getting different error.
Py4JJavaError: An error occurred while calling o43.load.
: java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: org.elasticsearch.spark.sql.DefaultSource15 Unable to get public no-arg constructor
at java.base/java.util.ServiceLoader.fail(ServiceLoader.java:586)
at java.base/java.util.ServiceLoader.getConstructor(ServiceLoader.java:679)
at java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNextService(ServiceLoader.java:1240)
at java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNext(ServiceLoader.java:1273)
at java.base/java.util.ServiceLoader$2.hasNext(ServiceLoader.java:1309)
at java.base/java.util.ServiceLoader$3.hasNext(ServiceLoader.java:1393)
at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
at scala.collection.Iterator.foreach(Iterator.scala:943)
I am guessing that your spark and/or scala version don't match up with your es-spark jar version. For Spark 3.3.0 you want to use elasticsearch-spark-30_2.12-8.6.1 or elasticsearch-spark-30_2.13-8.6.1, depending on whether your spark is using scala 2.12 or 2.13. Check out Python, Elasticsearch and Apache Spark, Simple data reading - #2 by Keith_Massey.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.