Continuing the discussion from EsHadoopInvalidRequest: [POST] :
I am having issues with the ES Spark Connector.
When I connect to 1 remote machine ES cluster, I get the following error
I can read an index, and I get the schema from the dataframe created with this command:
val options = Map("pushdown" -> "true", "es.nodes" -> "machine1", "es.port" -> "9200")
sqlContext.read.format("org.elasticsearch.spark.sql").options(options).load("index/type")
org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: ElasticsearchIllegalArgumentException[No data node with id[5_JV4ZDkRweIZcP79cNNOA] found]
When I connect to a different remote machine ES cluster, I can read the schema from ES.
Example:
val optionsDev = Map("pushdown" -> "true", "es.nodes" -> "machineDev", "es.port" -> "9200")
val spark14DF = sqlContext.read.format("org.elasticsearch.spark.sql").options(optionsDev).load("blog/post")
spark14DF: org.apache.spark.sql.DataFrame = [body: string, postDate: timestamp, title: string, user: string]
spark14DF.printSchema
root
|-- body: string (nullable = true)
|-- postDate: timestamp (nullable = true)
|-- title: string (nullable = true)
|-- user: string (nullable = true)
But when I ask for spark14DF.first()
I get the following error:
org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: IndexMissingException[[blog] missing]
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:352)
.....
Theses same commands work fine when I am using the localhost elasticsearch cluster which consists of three nodes.
The previous two example only use one node. But they each give different errors. So there must be something else I don't understand.
Thank you
costin
(Costin Leau)
December 18, 2015, 12:39pm
2
Likely your localhost cluster is configured differently then the remote clusters.