[HADOOP] [Spark] Problem with encoding of parentId containing backslash

andrassy · January 29, 2015, 4:55pm

Hi list,

I have an RDD with a field included that contains an ID that I'd like to
become the parent document when I execute saveToEs (all authored in scala).
Something like this...

{
"units_sold": 100,
"unit_price": 8.99,
"revenue": 899,
"parentId": "maplin\staging(L28AF)" //i.e. it has a single backlash in
it
}

This works fine until my parent id contains the backslash \ character, at
which point I get an exception. Escaping the backslash (\) doesn't work
for me either - the job runs successfully, but the _parent field is set to
the value with the double \, so it doesn't reference the intended parent.
I'd love to remove the slashes from my ids but this is, unfortunately, part
of a much bigger job

Stack trace for the single backlash version below....

15/01/28 17:05:46 WARN TaskSetManager: Lost task 3.3 in stage 0.2 (TID 425,
SERVER1): org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest:
JsonParseException
[Unrecognized character escape 's' (code 115)
at [Source: [B@68700ab1; line: 1, column: 44]];
fragment[ent":"binlin\staglow]

org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:322)

org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:299)
org.elasticsearch.hadoop.rest.RestClient.bulk(RestClient.java:149)

org.elasticsearch.hadoop.rest.RestRepository.tryFlush(RestRepository.jav
a:199)

org.elasticsearch.hadoop.rest.RestRepository.flush(RestRepository.java:2
23)

org.elasticsearch.hadoop.rest.RestRepository.close(RestRepository.java:2
36)

org.elasticsearch.hadoop.rest.RestService$PartitionWriter.close(RestServ
ice.java:125)

org.elasticsearch.spark.rdd.EsRDDWriter$$anonfun$write$1.apply$mcV$sp(Es
RDDWriter.scala:33)

org.apache.spark.TaskContext$$anon$2.onTaskCompletion(TaskContext.scala:
99)

org.apache.spark.TaskContext$$anonfun$markTaskCompleted$1.apply(TaskCont
ext.scala:107)

scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.sca
la:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

org.apache.spark.TaskContext.markTaskCompleted(TaskContext.scala:107)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:64)
org.apache.spark.scheduler.Task.run(Task.scala:54)

org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
java.lang.Thread.run(Unknown Source)

Many thanks for any advice or workarounds,

Neil A

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d6abd46c-9448-4d02-bb38-e11e2e0d302e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
[HADOOP] [Spark] Problem with encoding of parentId containing backslash Elasticsearch	9	1248	July 6, 2017
Pushing Data to Elasticsearch from Spark Elasticsearch es-hadoop	7	2169	February 8, 2017
ES JsonParseException Elasticsearch	4	1327	July 6, 2017
Illegal chars in doc id Elasticsearch	3	1223	July 6, 2017
Insert parent doc and child doc in One Spark job Elasticsearch es-hadoop	3	1248	July 6, 2017

[HADOOP] [Spark] Problem with encoding of parentId containing backslash

Related topics