Elasticsearch Hadoop - Issue reading GeoPoint as Dataset


#1

Hi everyone,

I am using Elasticsearch Hadoop v. 6.2.3 with Spark 2.3.0 trying to read data from Elasticsearch as a Dataset. However, I am receiving following Exception:

WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 172.18.0.2, executor 0): java.lang.IndexOutOfBoundsException: 1
at scala.collection.convert.Wrappers$JListWrapper.productElement(Wrappers.scala:85)
at scala.runtime.ScalaRunTime$$anon$1.next(ScalaRunTime.scala:177)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:251)

I am using Scala and have following case class:
case class MyEntity (
...
geometry: Geometry
...
)

Geometry looks like this:
case class Geometry (
type: String,
coordinates: GeoPoint
)

And GeoPoint looks like this:
case class GeoPoint (
lat: Double,
lon: Double
)

The schema of the data looks as follows:
[info] root
...
[info] |-- geometry: struct (nullable = true)
[info] | |-- coordinates: struct (nullable = true)
[info] | | |-- lat: double (nullable = true)
[info] | | |-- lon: double (nullable = true)
[info] | |-- type: string (nullable = true)
...

I am reading the data as dataset as follows:
val myEntities = sqlContext
.esDF("indexname/doctype")
.select(
...
$"geometry",
...
)
.as[MyEntity]

Without the geometry, I can successfully read the data from Elasticsearch, so it is definitely related to that.

Do you have an idea for the cause of the issue and know how to workaround that issue?

Thanks!


#2

By the way, it seems to be happening within the select statement, not during the conversion to a dataset, since I am receiving the same issue when removing "as[MyEntity]" and just calling "myEntities.show".


#3

This might be related: https://github.com/elastic/elasticsearch-hadoop/issues/951


(James Baiera) #4

It is highly likely that this is related to #951.

Can you post on that issue with your above reproduction for the bug. The more test cases we have on file when I tackle this issue, then the better the solution will be tested before it gets released.


#5

Thanks for your quick reply. I have commented on the Github issue.


(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.