Spark ES Read Error

We have several index(s) stored in out ES (7.1.1) cluster. edit-- Spark is 2.3.1, Scala 2.11.8, Hadoop HDP 3.0.1

When reading as a dataframe by
val reader = spark.read.format("org.elasticsearch.spark.sql").option("es.nodes", nodeList)
val x = reader.load("small_index")
x.show()

for the smaller index (about a million entries) it works ok

for a larger index (3 billion entries) we get the error below after any action on the dataframe. The job has over 300G allocated should not be a Spark resource issue (?).

Any suggestions welcome

> org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: invalid response
>         at org.elasticsearch.hadoop.util.Assert.isTrue(Assert.java:60)
>         at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:271)
>         at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:262)
>         at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:313)
>         at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:93)
>         at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:61)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>         at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.agg_doA                                                                      ggregateWithoutKey_0$(Unknown Source)
>         at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.process                                                                      Next(Unknown Source)
>         at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>         at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegen                                                                      Exec.scala:614)
>         at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>         at org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java                                                                      :125)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>         at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>         at org.apache.spark.scheduler.Task.run(Task.scala:109)
>         at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)

I would suggest to:

Thanks for the response.

just changing those settings (tried size at 1000, 20000, 10000) had no effect.

is there any logging I can change with user permissions, getting the admins to change hadoop config files requires some time and justification