Hi there,
We are looking at simplest and fastest way to get data from HDFS to ES.
One method we have been trying and having out of luck is ES- Hadoop with Spark
Component versions we are using are below
Versions;
ES - 1.7.3
Spark - 1.5.2
Scala - 2.10.4
JAVA - 1.7.0_67
-#We initiate spark shell with following jar files.
./spark-shell --jars esjava/elasticsearch-spark_2.11-2.1.2.jar esjava/elasticsearch-hadoop-mr-2.1.2.jar elasticsearch-hadoop-2.1.2.jar
-#Below we import following classes
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.elasticsearch.spark._
import org.apache.spark.SparkConf
import org.elasticsearch.spark.rdd.EsSpark
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.rdd.RDD
import org.elasticsearch.spark.sql._
import org.apache.spark.sql.SQLContext
import org.apache.spark.sql.SQLContext._
-# Below, we define our elastic search master
val conf = new SparkConf()
conf.set("es.nodes","hostname:9200")
-# Below we point json file and convert it to dataframe
val sqlContext = new SQLContext(sc)
val df = sqlContext.jsonFile("hdfs://namenode/tmp/2015-11-10.json")
-# Below We validate the schema
println(df.printSchema)
-# and below save it to Elastic
df.saveToEs("test/parquet")
-#And right after that where we get following error, not sure what we are doing wrong.
Error
java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
at org.elasticsearch.spark.sql.EsSparkSQL$.saveToEs(EsSparkSQL.scala:42)
at org.elasticsearch.spark.sql.package$SparkDataFrameFunctions.saveToEs(package.scala:25)
Any help is appreciated.