I am trying to index some geo_point data using apache spark, the operation fails with out giving me any information on what is wrong and how I can fix it. When I index the field as string everything checks out. Any clues on how to configure a bulk error handler to print meaningful message and move on. Currently spark crashes after some time.
I also have ignore_malformed set to true on elastic side so am guessing this is happening on spark side of things and not elastic.
"settings": {
"number_of_replicas" : 0,
"index.mapping.ignore_malformed": "true"
}
Here is a fun stacktrace:
ERROR Executor: Exception in task 7.1 in stage 0.0 (TID 13)
org.elasticsearch.hadoop.EsHadoopException: Could not write all entries for bulk operation [1/100]. Error sample (first [5] error messages):
failed to parse
Bailing out...
at org.elasticsearch.hadoop.rest.bulk.BulkProcessor.flush(BulkProcessor.java:475)
at org.elasticsearch.hadoop.rest.bulk.BulkProcessor.add(BulkProcessor.java:120)
at org.elasticsearch.hadoop.rest.RestRepository.doWriteToIndex(RestRepository.java:187)
at org.elasticsearch.hadoop.rest.RestRepository.writeToIndex(RestRepository.java:168)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:67)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624