Catch EsHadoopSerializationException while saving from spark


(Sushant Hiray) #1

Hi,

I'm currently trying to load some data into elastic search. I'm using the following snippet to write to ES

// input here is an rdd of json string
EsSpark.saveJsonToEs(input, "offers/product", Map[String, String]("es.mapping.id" -> "lumi_name"))

There seems to be some data strings which cause the job to fail

6/11/21 15:21:27 ERROR Executor: Exception in task 5021.3 in stage 1.0 (TID 42312)
org.elasticsearch.hadoop.serialization.EsHadoopSerializationException: org.codehaus.jackson.JsonParseException: Unexpected character ('h' (code 104)): was expecting comma to separate OBJECT entries
at [Source: [B@1a1d6f03; line: 1, column: 20]
at org.elasticsearch.hadoop.serialization.json.JacksonJsonParser.nextToken(JacksonJsonParser.java:95)
at org.elasticsearch.hadoop.serialization.ParsingUtils.doFind(ParsingUtils.java:167)
at org.elasticsearch.hadoop.serialization.ParsingUtils.values(ParsingUtils.java:150)
at org.elasticsearch.hadoop.serialization.field.JsonFieldExtractors.process(JsonFieldExtractors.java:201)
at org.elasticsearch.hadoop.serialization.bulk.JsonTemplatedBulk.preProcess(JsonTemplatedBulk.java:64)
at org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.write(TemplatedBulk.java:54)
at org.elasticsearch.hadoop.rest.RestRepository.writeToIndex(RestRepository.java:158)

Is there a way I can catch this error while writing to ES, so that the job doesn't fail but I can gracefully handle bad data?

Thanks,
Sushant


(James Baiera) #2

@sushant Right now the connector does not provide any failure hooks for bad data. I would instead recommend preemptively checking and remediating the JSON before it is sent out to the connector.


(Sushant Hiray) #3

@james.baiera Thanks for the clarification.


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.