I am trying to read data from elasticsearch from pyspark. I was using the elasticsearch-hadoop api in Spark. The es cluster sits on aws emr, which requires credential to sign in. My script is as below:
I checked everything which all made sense except for the user and pass entries, would aws access key and secret key work here? We don't want to use the console user and password here for security purpose. Please advice! Thanks!
Looks like this client just does not support AWS connector, and I need a custom AWS connect to connect to AWS EMR cluster. DO you have any suggestion what way I should do it?
I am not too familiar with integrating with technologies in AWS since we do not test against those environments at all in the connector. I will say that the es.net.https.auth.user and es.net.https.auth.pass configurations are packaged together as a Basic Http authentication scheme header in Base64 format. If you need alternative headers to be sent, I would specify them by setting them in the configuration: https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html#_setting_http_request_headers
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.