EsHadoopInvalidRequest: An HTTP line is larger than 4096 bytes

yoramd · June 19, 2018, 8:50pm

Hi
We're trying to use spark-elasticsearch to load data from an Elasticsearch index into Spark for processing.

We load the data using:

sparkSession
      .read
      .format("es")
      .option("pushdown", "false")
      .option("es.nodes", nodesUrl)
      .option("es.port", "9200")
      .load(indexName)

When running we get the following error:

org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: An HTTP line is larger than 4096 bytes.
{"query":{"match_all":{}}}
    at org.elasticsearch.hadoop.rest.RestClient.checkResponse(RestClient.java:505)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:463)
    at org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:445)
    at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:365)
    at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:92)
    at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:61)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    at org.apache.spark.util.random.SamplingUtils$.reservoirSampleAndCount(SamplingUtils.scala:41)
    at org.apache.spark.RangePartitioner$$anonfun$9.apply(Partitioner.scala:263)
    at org.apache.spark.RangePartitioner$$anonfun$9.apply(Partitioner.scala:261)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:844)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsWithIndex$1$$anonfun$apply$26.apply(RDD.scala:844)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

We tried to debug it and realized that the request URL sent to Elasticsearch includes the "_source" parameter as a URL parameter and not put inside the body of the request.

The documents in the index we're reading have 350 fields, which is probably the reason for why the URL is so long.

Is there any way of overcoming this error?
Are we doing something wrong here?

Thanks

james.baiera · June 21, 2018, 5:30pm

This is a known issue. There should be a fix landing soon: https://github.com/elastic/elasticsearch-hadoop/pull/1154

yoramd · June 22, 2018, 12:58am

Thanks James!

jamesjin · July 19, 2018, 3:15pm

Hi @james.baiera

I have the similar issue, how can fix it? thank you

james.baiera · July 25, 2018, 2:51pm

@jamesjin the fix should be applied now. Please check out the newly released 6.3.2 to pull the change in.

system · August 22, 2018, 2:51pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
EsHadoopInvalidRequest Batch size is too large Elasticsearch es-hadoop	2	1188	October 26, 2017
EsHadoopInvalidRequest Elasticsearch es-hadoop	6	5702	July 6, 2017
TooLongFrameException: HTTP header is larger than 8192 bytes Elasticsearch	1	930	July 6, 2017
Load data from spark to ElasticSearch Hadoop Elasticsearch es-hadoop	1	1111	July 6, 2017
Exception in netty while indexing, :( Elasticsearch	4	1245	July 6, 2017

EsHadoopInvalidRequest: An HTTP line is larger than 4096 bytes

Related topics