[HADOOP] [Spark] Problem with encoding of parentId containing backslash

Hi list,

I have an RDD with a field included that contains an ID that I'd like to
become the parent document when I execute saveToEs (all authored in scala).
Something like this...

{
"units_sold": 100,
"unit_price": 8.99,
"revenue": 899,
"parentId": "maplin\staging(L28AF)" //i.e. it has a single backlash in
it
}

This works fine until my parent id contains the backslash \ character, at
which point I get an exception. Escaping the backslash (\) doesn't work
for me either - the job runs successfully, but the _parent field is set to
the value with the double \, so it doesn't reference the intended parent.
I'd love to remove the slashes from my ids but this is, unfortunately, part
of a much bigger job :frowning:

Stack trace for the single backlash version below....

15/01/28 17:05:46 WARN TaskSetManager: Lost task 3.3 in stage 0.2 (TID 425,
SERVER1): org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest:
JsonParseException
[Unrecognized character escape 's' (code 115)
at [Source: [B@68700ab1; line: 1, column: 44]];
fragment[ent":"binlin\staglow]

org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:322)

org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:299)
org.elasticsearch.hadoop.rest.RestClient.bulk(RestClient.java:149)

org.elasticsearch.hadoop.rest.RestRepository.tryFlush(RestRepository.jav
a:199)

org.elasticsearch.hadoop.rest.RestRepository.flush(RestRepository.java:2
23)

org.elasticsearch.hadoop.rest.RestRepository.close(RestRepository.java:2
36)

org.elasticsearch.hadoop.rest.RestService$PartitionWriter.close(RestServ
ice.java:125)

org.elasticsearch.spark.rdd.EsRDDWriter$$anonfun$write$1.apply$mcV$sp(Es
RDDWriter.scala:33)

org.apache.spark.TaskContext$$anon$2.onTaskCompletion(TaskContext.scala:
99)

org.apache.spark.TaskContext$$anonfun$markTaskCompleted$1.apply(TaskCont
ext.scala:107)

org.apache.spark.TaskContext$$anonfun$markTaskCompleted$1.apply(TaskCont
ext.scala:107)

scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.sca
la:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

org.apache.spark.TaskContext.markTaskCompleted(TaskContext.scala:107)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:64)
org.apache.spark.scheduler.Task.run(Task.scala:54)

org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
java.lang.Thread.run(Unknown Source)

Many thanks for any advice or workarounds,

Neil A

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d6abd46c-9448-4d02-bb38-e11e2e0d302e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.