Hi,
I have been following the documentation for writing data from Spark / Java into Elasticsearch mentioned here : https://www.elastic.co/guide/en/elasticsearch/hadoop/current/spark.html#spark-sql
But every time the documents are written it's just metadata and not the actual data from RDD.
Is there any other config required to write to Elasticsearch from Spark/Java ?
I'm using ES v7.5.1 with spark v2.2.1 and elasticsearch-spark-20_2.11 v7.5.1
Code :
JavaSparkContext jsc = new JavaSparkContext(session.sparkContext());
// data to be saved
Map<String, ?> otp = ImmutableMap.of("iata", "OTP", "name", "Otopeni");
Map<String, ?> jfk = ImmutableMap.of("iata", "JFK", "name", "JFK NYC");
// create a pair RDD between the id and the docs
JavaPairRDD<?, ?> pairRdd = jsc.parallelizePairs(ImmutableList.of(
new Tuple2<Object, Object>(1, otp),
new Tuple2<Object, Object>(2, jfk)));
JavaEsSpark.saveToEsWithMeta(pairRdd, "spark-index");
Documents created :
{
"_index" : "spark-index",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0
},
{
"_index" : "spark-index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0
}