Using Elasticsearch Spark adapter in Jupyter notebooks with Python kernel

(michele crudele) #1


I used in the past the elasticsearch spark adapter in Jupyter notebooks with scala kernel adding the dependencies with the %AddJar file:///.../elasticsearch-spark_2.10-2.1.0.BUILD-SNAPSHOT.jar

I need to port my notebooks to Python, using the Python kernel. Is the Python binding available for elasticsearch ? And if so, how can I specify the dependency in the notebook ? (%AddDeps and %AddJar not available for python kernel).
I'd be grateful if you can point me to any documentation available / sample Jupyter notebook that can help me.

Thanks alot,

  • Michele

(Costin Leau) #2

ES-Hadoop/Spark is available only for the JVM, there's no native Python binding for it.
I'm not familiar enough with Python however you could work with ES by relying on the Input/OutputFormat; that is by pulling in the Map/Reduce layer as explained here.
Note this is still standard Spark and in fact, it is Spark that picks up the formats and uses it internally.

(michele crudele) #3

Thanks Costin, I'll try the mapreduce layer.
What are the benefits of using it in comparison with direct usage of
elasticsearch-py python library in my notebooks?
Il 27/nov/2015 02:33 PM, "Costin Leau" ha

(Costin Leau) #4

The docs [cover] this aspect as well.

(system) #5