No meaningful logs from bulk write operation failure

rkalluri · June 27, 2018, 4:41pm

I am trying to index some geo_point data using apache spark, the operation fails with out giving me any information on what is wrong and how I can fix it. When I index the field as string everything checks out. Any clues on how to configure a bulk error handler to print meaningful message and move on. Currently spark crashes after some time.

I also have ignore_malformed set to true on elastic side so am guessing this is happening on spark side of things and not elastic.

"settings": {
"number_of_replicas" : 0,
"index.mapping.ignore_malformed": "true"
}

Here is a fun stacktrace:

ERROR Executor: Exception in task 7.1 in stage 0.0 (TID 13)
org.elasticsearch.hadoop.EsHadoopException: Could not write all entries for bulk operation [1/100]. Error sample (first [5] error messages):
failed to parse
Bailing out...
at org.elasticsearch.hadoop.rest.bulk.BulkProcessor.flush(BulkProcessor.java:475)
at org.elasticsearch.hadoop.rest.bulk.BulkProcessor.add(BulkProcessor.java:120)
at org.elasticsearch.hadoop.rest.RestRepository.doWriteToIndex(RestRepository.java:187)
at org.elasticsearch.hadoop.rest.RestRepository.writeToIndex(RestRepository.java:168)
at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:67)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)
at org.elasticsearch.spark.sql.EsSparkSQL$$anonfun$saveToEs$1.apply(EsSparkSQL.scala:101)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624

james.baiera · July 2, 2018, 4:22pm

I agree, that is indeed a totally useless error message. I've opened https://github.com/elastic/elasticsearch-hadoop/issues/1170 in response.

system · July 30, 2018, 4:22pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
org.elasticsearch.hadoop.EsHadoopException: Could not write all entries for bulk operation [1/1]. Error sample (first [5] error messages): Elasticsearch	1	209	April 8, 2024
How to handle data that causes failure while indexing from spark to ES Elasticsearch es-hadoop	2	1980	October 10, 2017
From Hive to ES :EsHadoopException: Could not write all entries for bulk operation Elasticsearch	1	1060	March 22, 2019
What does "Could not write all entries for bulk operation" mean when I add two more data nodes into the cluster Elasticsearch	2	2879	November 1, 2019
Bulk insert with Spark causes org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed Elasticsearch es-hadoop	7	2955	August 3, 2023

No meaningful logs from bulk write operation failure

Related topics