Composite key for creating elastic search Index

(Sach) #1

I am working on HDFS to Elastic search integration via SparkSQL. I could able to read the csv data from HDFS and create the elastic search index. To create Elastic search Index ID I am using one of the unique column from the csv data. Now my requirement is Elastic search Index ID should be combination of 2 CSV columns. Does anybody aware how would I achieve this? I am using elasticsearch-spark library to create index. Below is the sample code.

SparkSession sparkSession = SparkSession.builder().config(config).getOrCreate();
SQLContext ctx = sparkSession.sqlContext();
HashMap<String, String> options = new HashMap<String, String>();
options.put("header", "true");
options.put("path", "hdfs://localhost:9000/test");
Dataset df ="com.databricks.spark.csv").options(options).load();
JavaEsSparkSQL.saveToEs(df, "spark/test", ImmutableMap.of("", "Id"));


(James Baiera) #2

In this case, the best option would be to use a Spark transformation to create the composite ID as a field on each record, and tell the connector to use that new composite field as the document's ID.

(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.