Cannot connect Spark to Elasticsearch *RESOLVED*

Hi,

I am trying to connect my Spark to Elasticsearch to read log data. I get all sorts of weird error messages. I would be happy of someone could give me a pointer with some complete documentation.

spark-shell --master mesos://spark-m00:5050 --jars /images/spark_jars/elasticsearch-hadoop-2.1.1.jar --conf spark.es.nodes=sidr35dgapan001 --conf spark.es.port=9200 --conf spark.es.nodes.discovery=false --conf spark.es.http.timeout=5m --conf spark.es.resource=logstash-2015.09.22/bjoern

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.elasticsearch.spark.rdd.EsSpark
import org.elasticsearch.spark._
import org.elasticsearch.hadoop.util.ObjectUtils

val bjoernRDD = sc.esRDD("logstash-2015.09.22/bjoern")

java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
at org.elasticsearch.spark.rdd.EsSpark$.esRDD(EsSpark.scala:26)
at org.elasticsearch.spark.package$SparkContextFunctions.esRDD(package.scala:20)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:30)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:35)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:37)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:39)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:41)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:43)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.(:45)
at $iwC$$iwC$$iwC$$iwC$$iwC.(:47)
at $iwC$$iwC$$iwC$$iwC.(:49)
at $iwC$$iwC$$iwC.(:51)
at $iwC$$iwC.(:53)
at $iwC.(:55)

BTW here are the versions I am on

Spark: 1.5 (spark-1.5.0-bin-hadoop2.6)
ES: 1.7.2

#Resolved

After a lot of reading and failing I have a Spark configuration that connects and reads from ES. What confused me when downloading the jars was there there is a whole bunch of them but you only need one. Adding some comments here in case someone should run into the same issue.

####elasticsearch-hadoop-2.1.1.jar

In addition I had a bit of junk in my root folder which I had to clean out. These were the steps that got my Spark V1.5 connected with ES V1.7.2

####execute the following commands as the root user

cd /
rm -rf .ivy2

export CLASSPATH=/images/spark_jars/elasticsearch-hadoop-2.1.1.jar

spark-shell --master local[4] --jars /images/spark_jars/elasticsearch-hadoop-2.1.1.jar --conf spark.es.nodes=spark-m00 --conf spark.es.port=9200 --conf spark.es.nodes.discovery=false --conf spark.es.http.timeout=5m --conf spark.es.resource=logstash-2015.09.22/bjoern

####When the Spark shell is available enter these Scala lines

import org.apache.spark.sql.SQLContext
import org.elasticsearch.spark
import org.elasticsearch.spark.sql
import org.elasticsearch.hadoop

val df = sqlContext.read.format("org.elasticsearch.spark.sql").load("logstash-2015.09.23/bjoern")
df.printSchema()
df.show()

The results is here and I can massage it accordingly.

+--------------------+--------+---------------+-----+--------------------+-----------------+------+------+
| @timestamp|@version| file|geoip| host| message|offset| type|
+--------------------+--------+---------------+-----+--------------------+-----------------+------+------+
|2015-09-23 13:57:...| 1|/tmp/bjoern.log| null|spark-c01|spark driver main| 51876|bjoern|
|2015-09-23 13:57:...| 1|/tmp/bjoern.log| null|spark-c01|spark driver main| 51966|bjoern|
|2015-09-23 13:57:...| 1|/tmp/bjoern.log| null|spark-c01|spark driver main| 52020|bjoern|
|2015-09-23 13:57:...| 1|/tmp/bjoern.log| null|spark-c01|spark driver main| 52038|bjoern|
|2015-09-23 13:57:...| 1|/tmp/bjoern.log| null|spark-c01|spark driver main| 52092|bjoern|
|2015-09-23 13:57:...| 1|/tmp/bjoern.log| null|spark-c01|spark driver main| 52182|bjoern|
|2015-09-23 13:57:...| 1|/tmp/bjoern.log| null|spark-c01|spark driver main| 52272|bjoern|
|2015-09-23 13:57:...| 1|/tmp/bjoern.log| null|spark-c01|spark driver main| 52362|bjoern|
|2015-09-23 13:57:...| 1|/tmp/bjoern.log| null|spark-c01|spark driver main| 52452|bjoern|
|2015-09-23 13:57:...| 1|/tmp/bjoern.log| null|spark-c01|spark driver main| 52524|bjoern|
|2015-09-23 13:57:...| 1|/tmp/bjoern.log| null|spark-c01|spark driver main| 52614|bjoern|
|2015-09-23 13:57:...| 1|/tmp/bjoern.log| null|spark-c01|spark driver main| 52704|bjoern|
|2015-09-23 13:57:...| 1|/tmp/bjoern.log| null|spark-c01|spark driver main| 52794|bjoern|
|2015-09-23 13:57:...| 1|/tmp/bjoern.log| null|spark-c01|spark driver main| 52884|bjoern|
+--------------------+--------+---------------+-----+--------------------+-----------------+------+------+

1 Like

Glad to hear you sorted things out. For best results, Spark 1.5 is supported currently in the [dev builds][1] , namely the upcoming 2.1.2 and 2.2.0.m2.

Cheers,
[1]: https://www.elastic.co/guide/en/elasticsearch/hadoop/current/install.html#download-dev