I'm trying to dump my ES to hadoop to let me work on the data and not bother anymore my cluster.
I did a simple job in MR to drop data to HDFS, but it fails with arrays. My ES data has arrays and array might be null or empty sometimes.
I get this error :
2016-01-20 18:33:22,679 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.NullPointerException
at org.apache.hadoop.io.ArrayWritable.write(ArrayWritable.java:105)
at org.elasticsearch.hadoop.mr.WritableArrayWritable.write(WritableArrayWritable.java:60)
at org.apache.hadoop.io.MapWritable.write(MapWritable.java:161)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:98)
at org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:82)
at org.apache.hadoop.io.SequenceFile$Writer.append(SequenceFile.java:1329)
at org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat$1.write(SequenceFileOutputFormat.java:83)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:658)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
It's not a compatibility issue, it's a bug in Hadoop (empty ArrayWritables, which are valid and can be constructed cannot be serialized). I've pushed a fix for this in master; the related issue can be found here:
Actually your fix seems not to work for me. I have this issue :
2016-02-08 12:04:34,101 WARN [main] org.apache.hadoop.mapred.YarnChild: Exception running child : java.lang.NullPointerException
at org.apache.hadoop.io.ArrayWritable.write(ArrayWritable.java:105)
at org.elasticsearch.hadoop.mr.WritableArrayWritable.write(WritableArrayWritable.java:60)
at org.apache.hadoop.io.MapWritable.write(MapWritable.java:161)
at org.apache.hadoop.io.AbstractMapWritable.copy(AbstractMapWritable.java:115)
at org.apache.hadoop.io.MapWritable.(MapWritable.java:55)
I'm using 2.2.0 version from maven repository.
It still get into this part :
@Override
public void write(DataOutput out) throws IOException {
out.writeInt(values.length); // write values
for (int i = 0; i < values.length; i++) {
values[i].write(out);
}
}
I don't understand how to handle this. The only way was to clean my ES index prior to do it, but this is not really easy to perform that way. This is not reliable.
When I have a document with a field array where the array does contains "null" value, I get this error. Actually, this is a new error. First time I got this one.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.