EsHadoopInvalidRequest: [POST]


#1

Hello I am trying to read data from ElasticSearch. My Code is as follows,

	SparkConf sparkConf = new SparkConf()
			.setAppName("Spark ES Integration").setMaster("local");
	// .set("spark.ui.port", "7077");
	sparkConf.set("es.nodes", "xx.xx.xx.xx");
	sparkConf.set("es.port", "9200");
	sparkConf.set("es.resource", "blog/post");
	sparkConf.set("es.query", "?q=user:dilbert");
	JavaSparkContext sc = new JavaSparkContext(sparkConf);

	JavaPairRDD<String, Map<String, Object>> esRDD = JavaEsSpark.esRDD(sc);
	System.out.println("**********" + esRDD.count()); // Prints 1 - Only one record is present
	System.out.println("**********" + esRDD.first()); // Throws exception 

Program outputs count correctly but throws exception when first record fetched from RDD.
Why request is of type POST when I just want to read the data?
What is wrong here? Query mentioned in configuration executes correctly.

Exception is

 org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: [POST] on [blog/post/_search?search_type=scan&scroll=5m&size=50&preference=_shards:0;_only_node:03oYNb7BTjG2vOzo9lSnzQ] failed; server[null] returned [400|Bad Request:]
.
.
3091 [Executor task launch worker-0] ERROR org.apache.spark.executor.Executor - Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.util.TaskCompletionListenerException: [POST] on [blog/post/_search?search_type=scan&scroll=5m&size=50&preference=_shards:0;_only_node:03oYNb7BTjG2vOzo9lSnzQ] failed; server[null] returned [400|Bad Request:]
.
.
3099 [task-result-getter-0] WARN  org.apache.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 0.0 (TID 0, localhost): org.apache.spark.util.TaskCompletionListenerException: [POST] on [blog/post/_search?search_type=scan&scroll=5m&size=50&preference=_shards:0;_only_node:03oYNb7BTjG2vOzo9lSnzQ] failed; server[null] returned [400|Bad Request:]

(Costin Leau) #2

What version of Elasticsearch and ES-Spark are you using?
POST is used instead of GET to get around some encoding issues with large URIs (basically the search request is passed in the body instead of the URI).

Likely there's something else in the request that causes the 400.
Potentially you can enable logging on the REST package to see what requests are made and what causes the exception.


#3

ElasticSearch version is 1.7
ES-Hadoop version is 2.2.0


(Costin Leau) #4

I assume that is 2.2-m1. Can you please post the log as a gist somewhere? Thanks,


#5

Hello costin, I managed to resolve my issue. The basic problem was I was executing my program on windows 32 bit machine and I was having 64 bit Winutils.exe. I migrated my code to linux machine. Now it works fine. Thanks for ur help!


(Costin Leau) #6

Why was Winutils.exe used by Spark or Hadoop? And how did it affect the REST requests?


#7

Actually I am executing it on Windows machine. The same code executes fine on Linux VM.
Thanks for the help


(Costin Leau) #8

This is weird. I do development and testing on Windows and things run fine. The CI builds on Linux - there should be no difference between the two OS.
Something else is at hand. Either way, I'm glad to see things are working out.


(Pinhus Dash) #9

I am having similar issues with the ES Spark Connector.

When I connect to a remote 1 machine ES cluster, I can read the schema from ES.
Example:
val spark14DF = sqlContext.read.format("org.elasticsearch.spark.sql").options(optionsDev).load("blog/post")

spark14DF: org.apache.spark.sql.DataFrame = [body: string, postDate: timestamp, title: string, user: string]

spark14DF.printSchema
root
|-- body: string (nullable = true)
|-- postDate: timestamp (nullable = true)
|-- title: string (nullable = true)
|-- user: string (nullable = true)

But when I ask for spark14DF.first()
I get the following error:
org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: IndexMissingException[[blog] missing]
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:352)
.....


ElasticSearch Spark Hadoop Connector
(Costin Leau) #10

@Pinhus_Dash Can you please open up a separate thread or potentially issue on Github? More information here


(system) #11