EsHadoopInvalidRequest: [POST]

Sar82 · October 5, 2015, 1:35pm

Hello I am trying to read data from ElasticSearch. My Code is as follows,

	SparkConf sparkConf = new SparkConf()
			.setAppName("Spark ES Integration").setMaster("local");
	// .set("spark.ui.port", "7077");
	sparkConf.set("es.nodes", "xx.xx.xx.xx");
	sparkConf.set("es.port", "9200");
	sparkConf.set("es.resource", "blog/post");
	sparkConf.set("es.query", "?q=user:dilbert");
	JavaSparkContext sc = new JavaSparkContext(sparkConf);

	JavaPairRDD<String, Map<String, Object>> esRDD = JavaEsSpark.esRDD(sc);
	System.out.println("**********" + esRDD.count()); // Prints 1 - Only one record is present
	System.out.println("**********" + esRDD.first()); // Throws exception

Program outputs count correctly but throws exception when first record fetched from RDD.
Why request is of type POST when I just want to read the data?
What is wrong here? Query mentioned in configuration executes correctly.

Exception is

 org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: [POST] on [blog/post/_search?search_type=scan&scroll=5m&size=50&preference=_shards:0;_only_node:03oYNb7BTjG2vOzo9lSnzQ] failed; server[null] returned [400|Bad Request:]
.
.
3091 [Executor task launch worker-0] ERROR org.apache.spark.executor.Executor - Exception in task 0.0 in stage 0.0 (TID 0)
org.apache.spark.util.TaskCompletionListenerException: [POST] on [blog/post/_search?search_type=scan&scroll=5m&size=50&preference=_shards:0;_only_node:03oYNb7BTjG2vOzo9lSnzQ] failed; server[null] returned [400|Bad Request:]
.
.
3099 [task-result-getter-0] WARN  org.apache.spark.scheduler.TaskSetManager - Lost task 0.0 in stage 0.0 (TID 0, localhost): org.apache.spark.util.TaskCompletionListenerException: [POST] on [blog/post/_search?search_type=scan&scroll=5m&size=50&preference=_shards:0;_only_node:03oYNb7BTjG2vOzo9lSnzQ] failed; server[null] returned [400|Bad Request:]

costin · October 5, 2015, 3:36pm

What version of Elasticsearch and ES-Spark are you using?
POST is used instead of GET to get around some encoding issues with large URIs (basically the search request is passed in the body instead of the URI).

Likely there's something else in the request that causes the 400.
Potentially you can enable logging on the REST package to see what requests are made and what causes the exception.

Sar82 · October 6, 2015, 4:47am

ElasticSearch version is 1.7
ES-Hadoop version is 2.2.0

costin · October 6, 2015, 1:10pm

I assume that is 2.2-m1. Can you please post the log as a gist somewhere? Thanks,

Sar82 · October 8, 2015, 9:47am

Hello costin, I managed to resolve my issue. The basic problem was I was executing my program on windows 32 bit machine and I was having 64 bit Winutils.exe. I migrated my code to linux machine. Now it works fine. Thanks for ur help!

costin · October 8, 2015, 12:14pm

Why was Winutils.exe used by Spark or Hadoop? And how did it affect the REST requests?

Sar82 · October 13, 2015, 2:15pm

Actually I am executing it on Windows machine. The same code executes fine on Linux VM.
Thanks for the help

costin · October 13, 2015, 2:22pm

This is weird. I do development and testing on Windows and things run fine. The CI builds on Linux - there should be no difference between the two OS.
Something else is at hand. Either way, I'm glad to see things are working out.

Pinhus_Dash · December 14, 2015, 5:04pm

I am having similar issues with the ES Spark Connector.

When I connect to a remote 1 machine ES cluster, I can read the schema from ES.
Example:
val spark14DF = sqlContext.read.format("org.elasticsearch.spark.sql").options(optionsDev).load("blog/post")

spark14DF: org.apache.spark.sql.DataFrame = [body: string, postDate: timestamp, title: string, user: string]

spark14DF.printSchema
root
|-- body: string (nullable = true)
|-- postDate: timestamp (nullable = true)
|-- title: string (nullable = true)
|-- user: string (nullable = true)

But when I ask for spark14DF.first()
I get the following error:
org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: IndexMissingException[[blog] missing]
at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:352)
.....

costin · December 18, 2015, 12:37pm

@Pinhus_Dash Can you please open up a separate thread or potentially issue on Github? More information here

Topic		Replies	Views
org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: returned [400\|Bad Request:] Elasticsearch es-hadoop	3	3203	September 2, 2017
EsHadoopInvalidRequest Elasticsearch es-hadoop	6	5693	July 6, 2017
Problem with retrieving data from ES into Spark Elasticsearch es-hadoop	3	4248	July 6, 2017
Connection Spark and ElasticSearch Elasticsearch es-hadoop	3	3315	August 27, 2017
ElasticSearch Spark Hadoop Connector Elasticsearch es-hadoop	2	1110	July 6, 2017

EsHadoopInvalidRequest: [POST]

Related topics