Log:
17/11/28 13:52:52 ERROR TaskContextImpl: Error in TaskCompletionListener
org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest: Found unrecoverable error [x.x.x.x:9205] returned Internal Server Error(500) - compound sub-files must have a valid codec header and footer: file is too small (0 bytes) (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/data1/elasticsearch/data/nodes/0/indices/S93P5ab_S42YDhmslWDKAQ/33/index/_9qa_Lucene50_0.doc"))); Bailing out..
at org.elasticsearch.hadoop.rest.RestClient.processBulkResponse(RestClient.java:251)
at org.elasticsearch.hadoop.rest.RestClient.bulk(RestClient.java:203)
at org.elasticsearch.hadoop.rest.RestRepository.tryFlush(RestRepository.java:248)
at org.elasticsearch.hadoop.rest.RestRepository.flush(RestRepository.java:270)
at org.elasticsearch.hadoop.rest.RestRepository.close(RestRepository.java:295)
at org.elasticsearch.hadoop.rest.RestService$PartitionWriter.close(RestService.java:121)
at org.elasticsearch.spark.rdd.EsRDDWriter$$anonfun$write$1.apply(EsRDDWriter.scala:60)
at org.elasticsearch.spark.rdd.EsRDDWriter$$anonfun$write$1.apply(EsRDDWriter.scala:60)
at org.apache.spark.TaskContext$$anon$1.onTaskCompletion(TaskContext.scala:123)
at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:97)
at org.apache.spark.TaskContextImpl$$anonfun$markTaskCompleted$1.apply(TaskContextImpl.scala:95)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:95)
at org.apache.spark.scheduler.Task.run(Task.scala:112)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
17/11/28 13:52:52 ERROR Executor: Exception in task 5980.0 in stage 1.0 (TID 7319)
org.apache.spark.util.TaskCompletionListenerException: Found unrecoverable error [x.x.x.x:9205] returned Internal Server Error(500) - compound sub-files must have a valid codec header and footer: file is too small (0 bytes) (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/data1/elasticsearch/data/nodes/0/indices/S93P5ab_S42YDhmslWDKAQ/33/index/_9qa_Lucene50_0.doc"))); Bailing out..
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:105)
at org.apache.spark.scheduler.Task.run(Task.scala:112)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
My Cluster has 5 SSD Servers with CentOS7. There are 6 instances on every Server. They are belonged to data nodes. Other 3 virtual machine are master nodes, 3 virtual machine are ingest nodes.
can you check the logfiles of the affected node for any exceptions/corruptions. It looks to me as if some data on there is corrupted, because some file is of size 0, as mentioned at the top.
I find a work round. When file system is ext4 or xfs, this issue must happen. But I change it to btrfs, there is no this issue. So I think it's relative with ssd, file system and os.
I think the majority of users runs on ext4 or xfs instead of brtfs. Are you running any special linux distribution? Also, is this an NFS volume? Curious about your setup...
The os is "CentOS Linux release 7.2.1511 (Core)". After weekend running, btrfs also has issue. The shards become unassign. They are not NFS volume. No clue for this issue now. My setup consist of 3 master, 2 ingest, 30 data. 30 data are deployed on 5 ssd machines. All roles are running on CentOS7.
have you tried running on another physical harddisk on the same node, and check if that keeps happening - I'd slowly consider a hardware failure here, unless there is some fancy script that nulls out your files...
have you checked dmesg output that might indicate a hardware failure?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.