EOFException while reading SequenceFile to restore ES

Hi,

I've been trying to backup an ES cluster and restore it using Hadoop and the hadoop ES library. I'm using ES 1.7.1.

Because of the size of data and network speed, I did bucket the operation by date ranges. For some buckets, everything is working fine.

But for some buckets, I can't restore the data into ES. I get this error on the sequence files reading :

`
2015-12-30 11:43:31,202 INFO [main] org.apache.hadoop.mapred.MapTask: Processing split: hdfs://.../2014/7/part-m-00011:0+134217728
2015-12-30 11:43:35,435 INFO [main] org.apache.hadoop.mapred.MapTask: (EQUATOR) 0 kvi 300417020(1201668080)
2015-12-30 11:43:35,435 INFO [main] org.apache.hadoop.mapred.MapTask: mapreduce.task.io.sort.mb: 1146
2015-12-30 11:43:35,435 INFO [main] org.apache.hadoop.mapred.MapTask: soft limit at 841167680
2015-12-30 11:43:35,435 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufvoid = 1201668096
2015-12-30 11:43:35,435 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 300417020; length = 75104256
2015-12-30 11:43:35,448 INFO [main] org.apache.hadoop.mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2015-12-30 11:43:35,904 INFO [main] org.apache.hadoop.mapred.MapTask: Starting flush of map output
2015-12-30 11:43:35,905 INFO [main] org.apache.hadoop.mapred.MapTask: Spilling map output
2015-12-30 11:43:35,905 INFO [main] org.apache.hadoop.mapred.MapTask: bufstart = 0; bufend = 106017; bufvoid = 1201668096
2015-12-30 11:43:35,905 INFO [main] org.apache.hadoop.mapred.MapTask: kvstart = 300417020(1201668080); kvend = 300417016(1201668064); length = 5/75104256
2015-12-30 11:43:35,912 INFO [main] org.apache.hadoop.mapred.MapTask: Finished spill 0
2015-12-30 11:43:35,921 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at org.apache.hadoop.io.Text.readWithKnownLength(Text.java:319)
at org.apache.hadoop.io.Text.readFields(Text.java:291)
at org.apache.hadoop.io.ArrayWritable.readFields(ArrayWritable.java:96)
at org.elasticsearch.hadoop.mr.WritableArrayWritable.readFields(WritableArrayWritable.java:54)
at org.apache.hadoop.io.MapWritable.readFields(MapWritable.java:188)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:71)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:42)
at org.apache.hadoop.io.SequenceFile$Reader.deserializeValue(SequenceFile.java:2247)
at org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2220)
at org.apache.hadoop.mapreduce.lib.input.SequenceFileRecordReader.nextKeyValue(SequenceFileRecordReader.java:78)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.nextKeyValue(MapTask.java:556)
at org.apache.hadoop.mapreduce.task.MapContextImpl.nextKeyValue(MapContextImpl.java:80)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.nextKeyValue(WrappedMapper.java:91)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

2015-12-30 11:43:35,925 INFO [main] org.apache.hadoop.mapred.Task: Runnning cleanup for the task
`

The odd thing, is that it occurs always on the same files. I suspect that the files are somehow corrupted, but I don't understand, how to repair (best solution fo course) or bypass.

Does anyone has a clue on this ?

BR,
Aurelien

I've seen this recently with a sequence file which was exactly 128MB (same as blocksize). Could you verify that this is the case with the specific file you mention?

Hi Kris,

Thank you for the idea, but no ! All the failing files are of different size ranging from 20MB to 230MB.

At first I was thinking that my entries were somehow too big and so, it was failing on the last entry.
But according to specification, a mapper will read the sequence file until it finds the end of entry and will bypass the block cuts to read correctly each entry.

Aurelien

Reparing data in HDFS is a bit of black magic since due to the replication (by default 3) it should not occur in the first place. My advice is to first back it up and try to check it and see whether something sticks out.
Then do the typical 'reboot' - delete it and add it again; this should refresh the namenode at least.
You could also try checking the low HDFS infrastructure and check whether all the copies/replicas are the same or not.

Unfortunately the problem is Hadoop specific so you might get more information from your distro forums then here.

Thanks for the ideas. I'm digging into this.