Unable to overwrite dataframe when documents have parent

I am trying to write a dataframe to Elasticsearch from Spark using ES-Hadoop 5.5.1. The documents are being written as children of pre-existing parent documents. Everything is working correctly the first time the DF is saved however it fails when attempting overwrite with:

  File "/usr/local/Cellar/apache-spark/2.1.1/libexec/python/lib/pyspark.zip/pyspark/sql/readwriter.py", line 550, in save
  File "/usr/local/Cellar/apache-spark/2.1.1/libexec/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
  File "/usr/local/Cellar/apache-spark/2.1.1/libexec/python/lib/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "/usr/local/Cellar/apache-spark/2.1.1/libexec/python/lib/py4j-0.10.4-src.zip/py4j/protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o153.save.
: org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found unrecoverable error [127.0.0.1:9205] returned    Bad Request(400) - routing is required for [test_content]/[scorearticle]/[0-news-trending-movavg-scorearticle-ZW0382A001S00]; Bailing out..

Looking at the code, it does not look like the routing or parent of the documents being deleted is used when creating the bulk delete requests:

Is there any way to get around this? Or do we need to look at deleting records ourselves prior to writing.

Yeah, that would indeed be the case here. Thanks for bringing this to our attention. Could you open an issue on the Github project with this information?

Will do.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.