Process:
I receive data from spark as Dataset. Then it is mapped to MyObject (according to the ES mapping) using map() function. Result is converted to RDD and passed to JavaEsSpark.saveToEs().
public class MyObject implements Serializable {
//some fields
private MyRelation relation;
}
public class MyRelation implements Serializable {
private String parent;
private String name = "child";
public MyRelation(String parent) {
this.parent = parent;
}
}
This is how data is inserted:
JavaEsSpark.saveToEs(
rdd,
"myIndex/myType,
ImmutableMap.of(
ConfigurationOptions.ES_MAPPING_JOIN, "relation")));
Relation from the mapping:
"relation": {
"type": "join",
"eager_global_ordinals": true,
"relations": {
"parent": "child"
}
},
And, finally, the error:
Previous handler messages:
Non retryable code [400] encountered.
org.elasticsearch.hadoop.rest.EsHadoopRemoteException: mapper_parsing_exception: failed to parse
at org.elasticsearch.hadoop.rest.ErrorExtractor.extractErrorWithCause(ErrorExtractor.java:30)
at org.elasticsearch.hadoop.rest.ErrorExtractor.extractError(ErrorExtractor.java:60)
at org.elasticsearch.hadoop.rest.bulk.BulkProcessor.tryFlush(BulkProcessor.java:245)
at org.elasticsearch.hadoop.rest.bulk.BulkProcessor.flush(BulkProcessor.java:499)
at org.elasticsearch.hadoop.rest.bulk.BulkProcessor.close(BulkProcessor.java:541)
at org.elasticsearch.hadoop.rest.RestRepository.close(RestRepository.java:219)
at org.elasticsearch.hadoop.rest.RestService$PartitionWriter.close(RestService.java:121)
at org.elasticsearch.spark.rdd.EsRDDWriter$$anonfun$write$1.apply(EsRDDWriter.scala:60)
at org.elasticsearch.spark.rdd.EsRDDWriter$$anonfun$write$1.apply(EsRDDWriter.scala:60)
at org.apache.spark.TaskContext$$anon$1.onTaskCompletion(TaskContext.scala:128)
at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:118)
at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:118)
at org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:131)
at org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:129)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:129)
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:117)
at org.apache.spark.scheduler.Task.run(Task.scala:125)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.elasticsearch.hadoop.rest.EsHadoopRemoteException: illegal_argument_exception: [routing] is missing for join field [relation]
at org.elasticsearch.hadoop.rest.ErrorExtractor.extractErrorWithCause(ErrorExtractor.java:30)
at org.elasticsearch.hadoop.rest.ErrorExtractor.extractErrorWithCause(ErrorExtractor.java:39)
Before inserting children, i insert a parent, and it is written to es successfully.
Adding a ConfigurationOptions.ES_MAPPING_ROUTING together with the existing document didn't help.
Same documents without relation are written successfully
Any help would very appreciated!