Hi,
I am working on HDFS to Elastic search integration via SparkSQL. I could able to read the csv data from HDFS and create the elastic search index. To create Elastic search Index ID I am using one of the unique column from the csv data. Now my requirement is Elastic search Index ID should be combination of 2 CSV columns. Does anybody aware how would I achieve this? I am using elasticsearch-spark library to create index. Below is the sample code.
SparkSession sparkSession = SparkSession.builder().config(config).getOrCreate();
SQLContext ctx = sparkSession.sqlContext();
HashMap<String, String> options = new HashMap<String, String>();
options.put("header", "true");
options.put("path", "hdfs://localhost:9000/test");
Dataset df = ctx.read().format("com.databricks.spark.csv").options(options).load();
JavaEsSparkSQL.saveToEs(df, "spark/test", ImmutableMap.of("es.mapping.id", "Id"));
Thanks
Sach