Catch EsHadoopSerializationException while saving from spark

(Sushant Hiray) #1


I'm currently trying to load some data into elastic search. I'm using the following snippet to write to ES

// input here is an rdd of json string
EsSpark.saveJsonToEs(input, "offers/product", Map[String, String]("" -> "lumi_name"))

There seems to be some data strings which cause the job to fail

6/11/21 15:21:27 ERROR Executor: Exception in task 5021.3 in stage 1.0 (TID 42312)
org.elasticsearch.hadoop.serialization.EsHadoopSerializationException: org.codehaus.jackson.JsonParseException: Unexpected character ('h' (code 104)): was expecting comma to separate OBJECT entries
at [Source: [B@1a1d6f03; line: 1, column: 20]
at org.elasticsearch.hadoop.serialization.json.JacksonJsonParser.nextToken(
at org.elasticsearch.hadoop.serialization.ParsingUtils.doFind(
at org.elasticsearch.hadoop.serialization.ParsingUtils.values(
at org.elasticsearch.hadoop.serialization.field.JsonFieldExtractors.process(
at org.elasticsearch.hadoop.serialization.bulk.JsonTemplatedBulk.preProcess(
at org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.write(

Is there a way I can catch this error while writing to ES, so that the job doesn't fail but I can gracefully handle bad data?


(James Baiera) #2

@sushant Right now the connector does not provide any failure hooks for bad data. I would instead recommend preemptively checking and remediating the JSON before it is sent out to the connector.

(Sushant Hiray) #3

@james.baiera Thanks for the clarification.

(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.