Connection Spark and ElasticSearch


(Sg87) #1

Hi,

I am using Python SQL Spark 2.1.0 and ElasticSearch 5.5.0 and I am trying to use the following command:
df.write.format("es").save("db/test")

My configuration options are as follows:
$SPARK_HOME/bin/pyspark --packages org.elasticsearch:elasticsearch-spark-20_2.11:5.3.1 --conf spark.es.nodes="52.XXX.XX.XX" --conf spark.es.port="9201" --conf spark.es.nodes.discovery=true --conf spark.es.net.http.auth.user="user" --conf spark.es.net.http.auth.pass="Password" --conf spark.es.nodes.wan.only=true

(I tried both with spark.es.nodes.wan.only=true and spark.es.nodes.wan.only=false

Before with an elastic 2.4.5 cluster, this was working perfectly fine, Now I am receive the following error:
An error occurred while calling o50.save.
: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: [HEAD] on [db/test] failed; server[52.XXX.XX.XX:9201] returned [400|Bad Request:]
at org.elasticsearch.hadoop.rest.RestClient.checkResponse(RestClient.java:505)
at org.elasticsearch.hadoop.rest.RestClient.executeNotFoundAllowed(RestClient.java:476)
at org.elasticsearch.hadoop.rest.RestClient.exists(RestClient.java:537)
at org.elasticsearch.hadoop.rest.RestRepository.isEmpty(RestRepository.java:473)
at org.elasticsearch.spark.sql.ElasticsearchRelation.isEmpty(DefaultSource.scala:508)
at org.elasticsearch.spark.sql.DefaultSource.createRelation(DefaultSource.scala:96)
at org.apache.spark.sql.execution.datasources.DataSource.write(DataSource.scala:518)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:215)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:198)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:280)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:214)
at java.lang.Thread.run(Thread.java:745)

I als tried to connect using a private and/or public key, with the following configuration settings:
--conf spark.es.net.ssl.keystore.location="/Users/stijngeuens/.ssh/key" --conf spark.es.net.ssl=true

In this case I received the following error:
An error occurred while calling o51.save.
: org.elasticsearch.hadoop.rest.EsHadoopTransportException: javax.net.ssl.SSLException: Unrecognized SSL message, plaintext connection?

Could anybody help on this matter?


(James Baiera) #2

Try upgrading to ES-Hadoop 5.5.0. In 5.5.0, Elasticsearch removed an old deprecated endpoint for checking existence on types. This was fixed in the ES-Hadoop 5.5.0 release.


(Sg87) #3

Thanks. I am able to conncet.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.