Hi,
I'm currently trying to load some data into elastic search. I'm using the following snippet to write to ES
// input here is an rdd of json string
EsSpark.saveJsonToEs(input, "offers/product", Map[String, String]("es.mapping.id" -> "lumi_name"))
There seems to be some data strings which cause the job to fail
6/11/21 15:21:27 ERROR Executor: Exception in task 5021.3 in stage 1.0 (TID 42312)
org.elasticsearch.hadoop.serialization.EsHadoopSerializationException: org.codehaus.jackson.JsonParseException: Unexpected character ('h' (code 104)): was expecting comma to separate OBJECT entries
at [Source: [B@1a1d6f03; line: 1, column: 20]
at org.elasticsearch.hadoop.serialization.json.JacksonJsonParser.nextToken(JacksonJsonParser.java:95)
at org.elasticsearch.hadoop.serialization.ParsingUtils.doFind(ParsingUtils.java:167)
at org.elasticsearch.hadoop.serialization.ParsingUtils.values(ParsingUtils.java:150)
at org.elasticsearch.hadoop.serialization.field.JsonFieldExtractors.process(JsonFieldExtractors.java:201)
at org.elasticsearch.hadoop.serialization.bulk.JsonTemplatedBulk.preProcess(JsonTemplatedBulk.java:64)
at org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.write(TemplatedBulk.java:54)
at org.elasticsearch.hadoop.rest.RestRepository.writeToIndex(RestRepository.java:158)
Is there a way I can catch this error while writing to ES, so that the job doesn't fail but I can gracefully handle bad data?
Thanks,
Sushant