[HADOOP] [Spark] Problem with encoding of parentId containing backslash

andrassy · January 29, 2015, 4:59pm

Hi list,

I have an RDD with a field included that contains an ID that I'd like to
become the parent document when I execute saveToEs (all authored in scala).
Something like this...

{
"units_sold": 100,
"unit_price": 8.99,
"revenue": 899,
"parentId": "binlin\staglow(L28AF)" //i.e. it has a single backlash in
it
}

This works fine until my parent id contains the backslash \ character, at
which point I get an exception. Escaping the backslash (\) doesn't work
for me either - the job runs successfully, but the _parent field is set to
the value with the double \, so it doesn't reference the intended parent.
I'd love to remove the slashes from my ids but this is, unfortunately, part
of a much bigger job

Stack trace for the single backlash version below....

15/01/28 17:05:46 WARN TaskSetManager: Lost task 3.3 in stage 0.2 (TID 425,
SERVER1): org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest:
JsonParseException
[Unrecognized character escape 's' (code 115)
at [Source: [B@68700ab1; line: 1, column: 44]];
fragment[ent":"binlin\staglow]

org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:322)

org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:299)
org.elasticsearch.hadoop.rest.RestClient.bulk(RestClient.java:149)

org.elasticsearch.hadoop.rest.RestRepository.tryFlush(RestRepository.jav
a:199)

org.elasticsearch.hadoop.rest.RestRepository.flush(RestRepository.java:2
23)

org.elasticsearch.hadoop.rest.RestRepository.close(RestRepository.java:2
36)

org.elasticsearch.hadoop.rest.RestService$PartitionWriter.close(RestServ
ice.java:125)

org.elasticsearch.spark.rdd.EsRDDWriter$$anonfun$write$1.apply$mcV$sp(Es
RDDWriter.scala:33)

org.apache.spark.TaskContext$$anon$2.onTaskCompletion(TaskContext.scala:
99)

org.apache.spark.TaskContext$$anonfun$markTaskCompleted$1.apply(TaskCont
ext.scala:107)

scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.sca
la:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)

org.apache.spark.TaskContext.markTaskCompleted(TaskContext.scala:107)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:64)
org.apache.spark.scheduler.Task.run(Task.scala:54)

org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
java.lang.Thread.run(Unknown Source)

Many thanks for any advice or workarounds,

Neil A

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/766edd46-5d68-4f6a-abc6-ea21b316ca56%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

costin · January 29, 2015, 7:13pm

What es-hadoop/spark are you using? Can you post snippet/gist on how you are calling saveToEs and what the Es-spark
configuration looks like (does the RDD contain JSON or rich objects, etc..)?
There are multiple ways to specify the parentId and in master (dev build) this should work no problem.

On 1/29/15 6:59 PM, Neil Andrassy wrote:

Hi list,

I have an RDD with a field included that contains an ID that I'd like to become the parent document when I
execute saveToEs (all authored in scala). Something like this...

{
"units_sold": 100,
"unit_price": 8.99,
"revenue": 899,
"parentId": "binlin\staglow(L28AF)" //i.e. it has a single backlash in it
}

This works fine until my parent id contains the backslash \ character, at which point I get an exception. Escaping the
backslash (\) doesn't work for me either - the job runs successfully, but the _parent field is set to the value with
the double \, so it doesn't reference the intended parent. I'd love to remove the slashes from my ids but this is,
unfortunately, part of a much bigger job

Stack trace for the single backlash version below....

15/01/28 17:05:46 WARN TaskSetManager: Lost task 3.3 in stage 0.2 (TID 425, SERVER1):
org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: JsonParseException
[Unrecognized character escape 's' (code 115)
at [Source: [B@68700ab1; line: 1, column: 44]]; fragment[ent":"binlin\staglow]
org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:322)
org.elasticsearch.hadoop.rest.RestClient.execute(RestClient.java:299)
org.elasticsearch.hadoop.rest.RestClient.bulk(RestClient.java:149)
org.elasticsearch.hadoop.rest.RestRepository.tryFlush(RestRepository.jav
a:199)
org.elasticsearch.hadoop.rest.RestRepository.flush(RestRepository.java:2
23)
org.elasticsearch.hadoop.rest.RestRepository.close(RestRepository.java:2
36)
org.elasticsearch.hadoop.rest.RestService$PartitionWriter.close(RestServ
ice.java:125)
org.elasticsearch.spark.rdd.EsRDDWriter$$anonfun$write$1.apply$mcV$sp(Es
RDDWriter.scala:33)
org.apache.spark.TaskContext$$anon$2.onTaskCompletion(TaskContext.scala:
99)
org.apache.spark.TaskContext$$anonfun$markTaskCompleted$1.apply(TaskCont
ext.scala:107)
org.apache.spark.TaskContext$$anonfun$markTaskCompleted$1.apply(TaskCont
ext.scala:107)
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.sca
la:59)
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
org.apache.spark.TaskContext.markTaskCompleted(TaskContext.scala:107)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:64)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
java.lang.Thread.run(Unknown Source)

Many thanks for any advice or workarounds,

Neil A

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/766edd46-5d68-4f6a-abc6-ea21b316ca56%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/766edd46-5d68-4f6a-abc6-ea21b316ca56%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/54CA8654.7060300%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

andrassy · January 29, 2015, 10:24pm

I'm using ES-Hadoop 1.2.0.Beta3 Spark variant with Scala 2.10.4 and Spark
1.1.0 Hadoop 2.4 (but without an actual Hadoop installation - I'm running
on Windows).

I'm working with a Map-based RDD rather than json.

https://gist.github.com/andrassy/273179ed7cb01a38973d is a short example
that throws an exception.

I'll also try the json approach and see if that works for me.

Thanks,

Neil

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7ba2f3ec-7f73-4ce4-a27b-93ae10c078e0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

costin · January 29, 2015, 10:52pm

I suggest trying master (the dev build - see the docs for more
information[1]). You should not have to use the JSON format. By the way,
one addition in master is that you can use case classes instead of Maps and
es-spark will know how to serialize them.
That plus having the metadata separated from the doc itself [2]

[1]

[2]

On Fri, Jan 30, 2015 at 12:24 AM, Neil Andrassy <neil.andrassy@thefilter.com

wrote:

I'm using ES-Hadoop 1.2.0.Beta3 Spark variant with Scala 2.10.4 and Spark
1.1.0 Hadoop 2.4 (but without an actual Hadoop installation - I'm running
on Windows).

I'm working with a Map-based RDD rather than json.

ParentWithBackslash.scala · GitHub is a short example
that throws an exception.

I'll also try the json approach and see if that works for me.

Thanks,

Neil

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7ba2f3ec-7f73-4ce4-a27b-93ae10c078e0%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/7ba2f3ec-7f73-4ce4-a27b-93ae10c078e0%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJogdmebvT%2BQ19rw9QSa9kCDd-pWx7-55_stL_xwY9rTw91eow%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

andrassy · January 29, 2015, 10:59pm

I get the same problem with the json string approach too.

On Thursday, 29 January 2015 22:24:07 UTC, Neil Andrassy wrote:

I'm using ES-Hadoop 1.2.0.Beta3 Spark variant with Scala 2.10.4 and Spark
1.1.0 Hadoop 2.4 (but without an actual Hadoop installation - I'm running
on Windows).

I'm working with a Map-based RDD rather than json.

ParentWithBackslash.scala · GitHub is a short example
that throws an exception.

I'll also try the json approach and see if that works for me.

Thanks,

Neil

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fa3816b7-e763-4856-ad98-9f22c0d516b3%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

costin · January 29, 2015, 11:00pm

Not sure if you've seen my previous message but please try out the master.

On Fri, Jan 30, 2015 at 12:59 AM, Neil Andrassy <neil.andrassy@thefilter.com

wrote:

I get the same problem with the json string approach too.

On Thursday, 29 January 2015 22:24:07 UTC, Neil Andrassy wrote:

I'm using ES-Hadoop 1.2.0.Beta3 Spark variant with Scala 2.10.4 and Spark
1.1.0 Hadoop 2.4 (but without an actual Hadoop installation - I'm running
on Windows).

I'm working with a Map-based RDD rather than json.

ParentWithBackslash.scala · GitHub is a short example
that throws an exception.

I'll also try the json approach and see if that works for me.

Thanks,

Neil

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fa3816b7-e763-4856-ad98-9f22c0d516b3%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fa3816b7-e763-4856-ad98-9f22c0d516b3%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAJogdmf_4CJfB7Q%2BT_g5xGqD_JVUJpa15cqRUuLBpt3cwsYcPA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

costin · January 30, 2015, 12:00pm

Works fine in master - see the comment added to your gist.

On 1/30/15 12:59 AM, Neil Andrassy wrote:

I get the same problem with the json string approach too.

On Thursday, 29 January 2015 22:24:07 UTC, Neil Andrassy wrote:
I'm using ES-Hadoop 1.2.0.Beta3 Spark variant with Scala 2.10.4 and Spark 1.1.0 Hadoop 2.4 (but without an actual
Hadoop installation - I'm running on Windows).

I'm working with a Map-based RDD rather than json.

https://gist.github.com/andrassy/273179ed7cb01a38973d <https://gist.github.com/andrassy/273179ed7cb01a38973d> is a
short example that throws an exception.

I'll also try the json approach and see if that works for me.

Thanks,

Neil
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fa3816b7-e763-4856-ad98-9f22c0d516b3%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fa3816b7-e763-4856-ad98-9f22c0d516b3%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/54CB7277.9090303%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

andrassy · January 30, 2015, 12:12pm

Hi Costin,

Thanks for your help. I saw your message but it's taken me a while to
resolve and build with the nightly snapshot. I'm using SBT and I couldn't
figure out how to add and resolve the dependency. I eventually pulled the
file locally and dropped in my lib directory and now I can compile again
using the elasticsearch-spark_2.10-2.1.0.BUILD-20150130.023537-206.jar.
Guess, maybe that's for another post, but anyway...

The bad news is I still get more or less the same problem (but with Jackson
in there - this is from the Map version but I still get a very similar
exception from the json string version). Somewhere it just seems very
unhappy about having \s in the parentId.

My good news is that I can get the case class variant using the PairRDD
when I set both ID and PARENT in the Metadata to work
(using saveToEsWithMeta). This gets me back on the road, but I think
there's still an issue setting the parent from either JSON or Map using
es.mapping.parent in master.

Thanks again,

Neil

Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to s
tage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost
task
1.3 in stage 0.0 (TID 4, SERVER1):
org.elasticsearch.hadoop.serialization.EsH
adoopSerializationException: org.codehaus.jackson.JsonParseException:
Unrecogniz
ed character escape 's' (code 115)
at [Source: [B@2a7b201e; line: 3, column: 17]

org.elasticsearch.hadoop.serialization.json.JacksonJsonParser.text(Jacks
onJsonParser.java:153)

org.elasticsearch.hadoop.serialization.ParsingUtils.doFind(ParsingUtils.
java:211)

org.elasticsearch.hadoop.serialization.ParsingUtils.values(ParsingUtils.
java:150)

org.elasticsearch.hadoop.serialization.field.JsonFieldExtractors.process
(JsonFieldExtractors.java:201)

org.elasticsearch.hadoop.serialization.bulk.JsonTemplatedBulk.preProcess
(JsonTemplatedBulk.java:64)

org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.write(Template
dBulk.java:54)

org.elasticsearch.hadoop.rest.RestRepository.writeToIndex(RestRepository
.java:145)
org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:47)

org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.sc
ala:51)

org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.sc
ala:51)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
org.apache.spark.scheduler.Task.run(Task.scala:54)

org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
java.lang.Thread.run(Unknown Source)
Driver stacktrace:
at
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DA
GScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(D
AGScheduler.scala:1174)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(D
AGScheduler.scala:1173)
at
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.
scala:59)
at
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at
org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala
:1173)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$
1.apply(DAGScheduler.scala:688)
at
org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$
1.apply(DAGScheduler.scala:688)
at scala.Option.foreach(Option.scala:236)
at
org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGSchedu
ler.scala:688)
at
org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$rec
eive$2.applyOrElse(DAGScheduler.scala:1391)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(Abst
ractDispatcher.scala:386)
at
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool
.java:1339)
at
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:19
79)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThre
ad.java:107)

On Thursday, 29 January 2015 23:00:57 UTC, Costin Leau wrote:

Not sure if you've seen my previous message but please try out the master.

On Fri, Jan 30, 2015 at 12:59 AM, Neil Andrassy <neil.a...@thefilter.com
<javascript:>> wrote:

I get the same problem with the json string approach too.

On Thursday, 29 January 2015 22:24:07 UTC, Neil Andrassy wrote:

I'm using ES-Hadoop 1.2.0.Beta3 Spark variant with Scala 2.10.4 and
Spark 1.1.0 Hadoop 2.4 (but without an actual Hadoop installation - I'm
running on Windows).

I'm working with a Map-based RDD rather than json.

ParentWithBackslash.scala · GitHub is a short
example that throws an exception.

I'll also try the json approach and see if that works for me.

Thanks,

Neil

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/fa3816b7-e763-4856-ad98-9f22c0d516b3%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/fa3816b7-e763-4856-ad98-9f22c0d516b3%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/453bcc08-f308-4751-a463-fd77620a4e1b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

costin · January 30, 2015, 1:21pm

The dev build is available in a maven repo so sbt should be able to download it from there without you having to
download it manually.

es-hadoop automatically does escaping of characters before translating to JSON. That happens for objects that are
converted such as a case class or a Map.
If you however pass the input as JSON then es-spark will take the content as is and use that accordingly. In other
words, it's your responsibility to make sure the JSON is valid since es-spark will simply pass that as it is.
"a\s" is not a valid JSON string, "a\s" is so make sure to take that into account.

Once things stabilize, please come back with a short/concrete example of testing the parent (with map and/or json input)
just like you did with your initial gist as I am unable to reproduce the issue ...

Cheers,

On 1/30/15 2:12 PM, Neil Andrassy wrote:

Hi Costin,

Thanks for your help. I saw your message but it's taken me a while to resolve and build with the nightly snapshot. I'm
using SBT and I couldn't figure out how to add and resolve the dependency. I eventually pulled the file locally and
dropped in my lib directory and now I can compile again using the
elasticsearch-spark_2.10-2.1.0.BUILD-20150130.023537-206.jar. Guess, maybe that's for another post, but anyway...

The bad news is I still get more or less the same problem (but with Jackson in there - this is from the Map version but
I still get a very similar exception from the json string version). Somewhere it just seems very unhappy about having \s
in the parentId.

My good news is that I can get the case class variant using the PairRDD when I set both ID and PARENT in the Metadata to
work (using saveToEsWithMeta). This gets me back on the road, but I think there's still an issue setting the parent from
either JSON or Map using es.mapping.parent in master.

Thanks again,

Neil

Exception in thread "main" org.apache.spark.SparkException: Job aborted due to s
tage failure: Task 1 in stage 0.0 failed 4 times, most recent failure: Lost task
1.3 in stage 0.0 (TID 4, SERVER1): org.elasticsearch.hadoop.serialization.EsH
adoopSerializationException: org.codehaus.jackson.JsonParseException: Unrecogniz
ed character escape 's' (code 115)
at [Source: [B@2a7b201e; line: 3, column: 17]
org.elasticsearch.hadoop.serialization.json.JacksonJsonParser.text(Jacks
onJsonParser.java:153)
org.elasticsearch.hadoop.serialization.ParsingUtils.doFind(ParsingUtils.
java:211)
org.elasticsearch.hadoop.serialization.ParsingUtils.values(ParsingUtils.
java:150)
org.elasticsearch.hadoop.serialization.field.JsonFieldExtractors.process
(JsonFieldExtractors.java:201)
org.elasticsearch.hadoop.serialization.bulk.JsonTemplatedBulk.preProcess
(JsonTemplatedBulk.java:64)
org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.write(Template
dBulk.java:54)
org.elasticsearch.hadoop.rest.RestRepository.writeToIndex(RestRepository
.java:145)
org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:47)
org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.sc
ala:51)
org.elasticsearch.spark.rdd.EsSpark$$anonfun$saveToEs$1.apply(EsSpark.sc
ala:51)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
java.lang.Thread.run(Unknown Source)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DA
GScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(D
AGScheduler.scala:1174)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(D
AGScheduler.scala:1173)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.
scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala
:1173)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$
1.apply(DAGScheduler.scala:688)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$
1.apply(DAGScheduler.scala:688)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGSchedu
ler.scala:688)
at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$rec
eive$2.applyOrElse(DAGScheduler.scala:1391)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(Abst
ractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool
.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:19
79)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThre
ad.java:107)

On Thursday, 29 January 2015 23:00:57 UTC, Costin Leau wrote:
Not sure if you've seen my previous message but please try out the master.

On Fri, Jan 30, 2015 at 12:59 AM, Neil Andrassy <neil.a...@thefilter.com <javascript:>> wrote:

    I get the same problem with the json string approach too.

    On Thursday, 29 January 2015 22:24:07 UTC, Neil Andrassy wrote:

        I'm using ES-Hadoop 1.2.0.Beta3 Spark variant with Scala 2.10.4 and Spark 1.1.0 Hadoop 2.4 (but without an
        actual Hadoop installation - I'm running on Windows).

        I'm working with a Map-based RDD rather than json.

        https://gist.github.com/__andrassy/273179ed7cb01a38973d
        <https://gist.github.com/andrassy/273179ed7cb01a38973d> is a short example that throws an exception.

        I'll also try the json approach and see if that works for me.

        Thanks,

        Neil

    --
    You received this message because you are subscribed to the Google Groups "elasticsearch" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to
    elasticsearc...@googlegroups.com <javascript:>.
    To view this discussion on the web visit
    https://groups.google.com/d/msgid/elasticsearch/fa3816b7-e763-4856-ad98-9f22c0d516b3%40googlegroups.com
    <https://groups.google.com/d/msgid/elasticsearch/fa3816b7-e763-4856-ad98-9f22c0d516b3%40googlegroups.com?utm_medium=email&utm_source=footer>.

    For more options, visit https://groups.google.com/d/optout <https://groups.google.com/d/optout>.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/453bcc08-f308-4751-a463-fd77620a4e1b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/453bcc08-f308-4751-a463-fd77620a4e1b%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/54CB8556.3090506%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
[HADOOP] [Spark] Problem with encoding of parentId containing backslash Elasticsearch	1	354	July 6, 2017
Elasticsearch and spark Elasticsearch	7	1173	July 6, 2017
[Spark] SchemaRdd saveToEs produces "Bad JSON" errors Elasticsearch	2	706	July 6, 2017
Insert parent doc and child doc in One Spark job Elasticsearch es-hadoop	3	1249	July 6, 2017
Java.lang.NoSuchFieldError: ALLOW_UNQUOTED_FIELD_NAMES when trying to query elasticsearch using spark Elasticsearch	8	1104	July 6, 2017

[HADOOP] [Spark] Problem with encoding of parentId containing backslash

Related topics