Elasticsearch Hadoop - Issue reading GeoPoint as Dataset

brommer · May 30, 2018, 11:54am

Hi everyone,

I am using Elasticsearch Hadoop v. 6.2.3 with Spark 2.3.0 trying to read data from Elasticsearch as a Dataset. However, I am receiving following Exception:

WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 172.18.0.2, executor 0): java.lang.IndexOutOfBoundsException: 1
at scala.collection.convert.Wrappers$JListWrapper.productElement(Wrappers.scala:85)
at scala.runtime.ScalaRunTime$$anon$1.next(ScalaRunTime.scala:177)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:251)

I am using Scala and have following case class:
case class MyEntity (
...
geometry: Geometry
...
)

Geometry looks like this:
case class Geometry (
type: String,
coordinates: GeoPoint
)

And GeoPoint looks like this:
case class GeoPoint (
lat: Double,
lon: Double
)

The schema of the data looks as follows:
[info] root
...
[info] |-- geometry: struct (nullable = true)
[info] | |-- coordinates: struct (nullable = true)
[info] | | |-- lat: double (nullable = true)
[info] | | |-- lon: double (nullable = true)
[info] | |-- type: string (nullable = true)
...

I am reading the data as dataset as follows:
val myEntities = sqlContext
.esDF("indexname/doctype")
.select(
...
$"geometry",
...
)
.as[MyEntity]

Without the geometry, I can successfully read the data from Elasticsearch, so it is definitely related to that.

Do you have an idea for the cause of the issue and know how to workaround that issue?

Thanks!

brommer · May 30, 2018, 12:12pm

By the way, it seems to be happening within the select statement, not during the conversion to a dataset, since I am receiving the same issue when removing "as[MyEntity]" and just calling "myEntities.show".

brommer · May 30, 2018, 12:43pm

This might be related: https://github.com/elastic/elasticsearch-hadoop/issues/951

james.baiera · May 30, 2018, 7:57pm

It is highly likely that this is related to #951.

Can you post on that issue with your above reproduction for the bug. The more test cases we have on file when I tackle this issue, then the better the solution will be tested before it gets released.

brommer · May 31, 2018, 6:39am

Thanks for your quick reply. I have commented on the Github issue.

system · June 28, 2018, 6:39am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
ElasticSearch+Hadoop+Spark Elasticsearch	2	964	July 6, 2017
Geo_point field problems Elasticsearch es-hadoop	2	943	April 12, 2017
NullPointerException when settings "es.read.field.as.array.include" options Elasticsearch es-hadoop	7	1659	July 6, 2017
Elasticsearch-spark-30 read missing field(double type) error Elasticsearch es-hadoop	9	1355	December 31, 2022
How to cast incorrectly detected schema. Pyspark-ES Elasticsearch es-hadoop	1	1687	April 9, 2018

Elasticsearch Hadoop - Issue reading GeoPoint as Dataset

Related topics