Hi everyone,
I am using Elasticsearch Hadoop v. 6.2.3 with Spark 2.3.0 trying to read data from Elasticsearch as a Dataset. However, I am receiving following Exception:
WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 172.18.0.2, executor 0): java.lang.IndexOutOfBoundsException: 1
at scala.collection.convert.Wrappers$JListWrapper.productElement(Wrappers.scala:85)
at scala.runtime.ScalaRunTime$$anon$1.next(ScalaRunTime.scala:177)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:251)
I am using Scala and have following case class:
case class MyEntity (
...
geometry: Geometry
...
)
Geometry looks like this:
case class Geometry (
type
: String,
coordinates: GeoPoint
)
And GeoPoint looks like this:
case class GeoPoint (
lat: Double,
lon: Double
)
The schema of the data looks as follows:
[info] root
...
[info] |-- geometry: struct (nullable = true)
[info] | |-- coordinates: struct (nullable = true)
[info] | | |-- lat: double (nullable = true)
[info] | | |-- lon: double (nullable = true)
[info] | |-- type: string (nullable = true)
...
I am reading the data as dataset as follows:
val myEntities = sqlContext
.esDF("indexname/doctype")
.select(
...
$"geometry",
...
)
.as[MyEntity]
Without the geometry, I can successfully read the data from Elasticsearch, so it is definitely related to that.
Do you have an idea for the cause of the issue and know how to workaround that issue?
Thanks!