Catch EsHadoopSerializationException while saving from spark

sushant · November 23, 2016, 6:11pm

Hi,

I'm currently trying to load some data into elastic search. I'm using the following snippet to write to ES

// input here is an rdd of json string
EsSpark.saveJsonToEs(input, "offers/product", Map[String, String]("es.mapping.id" -> "lumi_name"))

There seems to be some data strings which cause the job to fail

6/11/21 15:21:27 ERROR Executor: Exception in task 5021.3 in stage 1.0 (TID 42312)
org.elasticsearch.hadoop.serialization.EsHadoopSerializationException: org.codehaus.jackson.JsonParseException: Unexpected character ('h' (code 104)): was expecting comma to separate OBJECT entries
at [Source: [B@1a1d6f03; line: 1, column: 20]
at org.elasticsearch.hadoop.serialization.json.JacksonJsonParser.nextToken(JacksonJsonParser.java:95)
at org.elasticsearch.hadoop.serialization.ParsingUtils.doFind(ParsingUtils.java:167)
at org.elasticsearch.hadoop.serialization.ParsingUtils.values(ParsingUtils.java:150)
at org.elasticsearch.hadoop.serialization.field.JsonFieldExtractors.process(JsonFieldExtractors.java:201)
at org.elasticsearch.hadoop.serialization.bulk.JsonTemplatedBulk.preProcess(JsonTemplatedBulk.java:64)
at org.elasticsearch.hadoop.serialization.bulk.TemplatedBulk.write(TemplatedBulk.java:54)
at org.elasticsearch.hadoop.rest.RestRepository.writeToIndex(RestRepository.java:158)

Is there a way I can catch this error while writing to ES, so that the job doesn't fail but I can gracefully handle bad data?

Thanks,
Sushant

james.baiera · November 28, 2016, 4:35pm

@sushant Right now the connector does not provide any failure hooks for bad data. I would instead recommend preemptively checking and remediating the JSON before it is sent out to the connector.

sushant · November 29, 2016, 3:50am

@james.baiera Thanks for the clarification.

system · December 27, 2016, 3:50am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
saveToEsWithMeta with Json values Elasticsearch es-hadoop	2	1112	July 6, 2017
ElasticSearch hadoop - .EsHadoopSerializationException Elasticsearch	5	919	July 6, 2017
Catching exceptions from saveToEs (elasticsearch-spark) Elasticsearch es-hadoop	5	2430	July 6, 2017
How to handle data that causes failure while indexing from spark to ES Elasticsearch es-hadoop	2	1980	October 10, 2017
ES JsonParseException Elasticsearch	4	1310	July 6, 2017

Catch EsHadoopSerializationException while saving from spark

Related topics