Use case:
Query secure Elasticsearch cluster (https and basic authentication enabled) using Apache Spark(pyspark and spark-submit)
What I tried:
start pyspark as follows:
./bin/pyspark --jars ./jars/elasticsearch-hadoop-7.2.0.jar --files /opt/ssl/jkeystore/elastic --driver-class-path /opt/ssl/jkeystore/elastic --conf "spark.executor.extraJavaOptions=-Djavax.net.ssl.trustStore=elastic" --conf "spark.execurot.extraJavaOptions=-Djavax.net.ssl.trustStorePassword=xxxxxx"
Query Elasticsearch as follows:
df = spark.read.format("org.elasticsearch.spark.sql").option("es.nodes","https://elasticsearch:9200").option("es.resource","index/_doc").option("es.read.field.as.array.include","tags").option("es.net.http.auth.user","user").option("es.net.http.auth.pass","password").option("es.net.ssl","true").load()
I'm getting error as below:
Caused by: org.elasticsearch.hadoop.rest.EsHadoopTransportException: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
Caused by: javax.net.ssl.SSLHandshakeException: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
Caused by: sun.security.validator.ValidatorException: PKIX path building failed: sun.security.provider.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target
Apparently it looks like spark is unable to understand truststore settings.
Elasticsearch Hadoop doesn't have any options to add certificates to keystore file using secure settings.
How do I configure it correctly so that I can talk to Elasticsearch?