Unable to connect to AWS Elastic-Search Instance through AWS Spark EMR

Hi Team,

We have requirement where, we need to load the parquet file from S3 bucket and ingest this data in to the AWS Elastic-Search using Spark (Scala) Job.

To Achieve this i have tried in my local system by installing Elastic Search in my local machine and i am able to ingest the data in to Elastic-Search (Elastic-Search which is running in local machine) using Spark-Scala job.

Local Machine:

Here is the working code to connect to the Local Elastic Search...

object ElasticSearchWriteSample {
def main(args: Array[String]) {

    val sparkSession = SparkSession.builder().appName("WriteToElasticSearch").master("local[*]").getOrCreate()
            sparkSession.conf.set("spark.sql.parquet.binaryAsString", "true")
            val filePath = args(0) // Here i will get the file path through vargs
            val dataFrame = sparkSession.read.parquet(filePath)

            dataFrame.write
              .format("org.elasticsearch.spark.sql")
              .option("es.port", "9200")
              .option("es.nodes", "localhost")
              .mode(SaveMode.Append)
              .save("tcrref_testdb/doc")

}
}

I am able to Push the data in to Elastic Search through the Spark Job in my local system to the local machine elastic-search.

In AWS:

Same thing i tried by running spark job in EMR which will connects the AWS-Elastic-Search Instance. But it is unable to connect to the Elastic Search which is running inn AWS instance.

Here the code...

object ElasticSearchWriteSample {
def main(args: Array[String]) {

    val sparkSession = SparkSession.builder().appName("WriteToElasticSearch").master("cluster").getOrCreate()
            sparkSession.conf.set("spark.sql.parquet.binaryAsString", "true")
            val filePath = args(0) // Here i will get the file path through vargs
            val dataFrame = sparkSession.read.parquet(filePath)

            dataFrame.write
              .format("org.elasticsearch.spark.sql")
              .option("es.port", "9200")
              .option("es.nodes", "https://*****.[amazonaws.com](http://amazonaws.com/)")
              .option("es.nodes.wan.only", "true")
              .mode(SaveMode.Append)
              .save("tcrref_testdb/doc")

}
}

This AWS job is not working.

Could you please help us, how to connect to the AWS Elastic-Search instance from EMR.

I would recommended you raise this with AWS support as they are aware and f the changes and limitations of their platform/version of Elasticsearch.

Sure, I will check with AWS Team.

But just curious if it is working with Local Mode it should work for AWS as well.

I feel my config are correct, suggest me if i am missing anything else while doing any config, so that i can contact the AWS team.

          .option("es.port", "9200")
          .option("es.nodes", "https://*****.[amazonaws.com](http://amazonaws.com/)")
          .option("es.nodes.wan.only", "true")

I read the elastic docx as well and there are no specific config to connect with AWS apart from .option("es.nodes.wan.only", "true") for AWS.

Yes, I contacted with AWS team, now i am able to connect the AWS-ES. The fix we need to add the Security Access Rule in the Security Groups in the EMR Cluster.
And additional Config are
.option("es.nodes.wan.only","true")
.option("es.net.ssl","true")

Thank you.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.