Inner_hits with elasticsearch-hadoop

Trying here to use a nested query with inner_hits in combination with elasticsearch-hadoop. I use es-6.8.1 via python using:
df = spark.read.format("org.elasticsearch.spark.sql")

I specify the "es.read.metadata": "true" (otherwise no inner_hits shows up in the dataframe), resulting in the stacktrace below. Is there anyway to get inner_hits into the dataframe?

: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 3, eladsu7061.hadoop.internal, executor 7): org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: expected end of object, found END_ARRAY

            at org.elasticsearch.hadoop.util.Assert.isTrue(Assert.java:60)

            at org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:433)

            at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:292)

            at org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:262)

            at org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:313)

            at org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:93)

            at org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:61)

            at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)

            at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)

            at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)

            at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)

            at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)

            at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)

            at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)

            at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)

            at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)

            at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)

            at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)

            at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)

            at org.apache.spark.scheduler.Task.run(Task.scala:99)

            at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)

            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)

            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

            at java.lang.Thread.run(Thread.java:748)

Answer to myself: see related discussion in https://github.com/elastic/elasticsearch-hadoop/issues/460

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.