I writing a spark application where I want to use ES with Hadoop. I have a
lot of document in ES now I want to aggregate but I can't.
My document's have different fields which means some have "twitter" field,
with values, some have "facebook" etc
When I try to read the data from ES I got an exception:
java.lang.NullPointerException
at
org.elasticsearch.hadoop.serialization.dto.mapping.Field.add(Field.java:110)
at
org.elasticsearch.hadoop.serialization.dto.mapping.Field.add(Field.java:111)
at
org.elasticsearch.hadoop.serialization.dto.mapping.Field.add(Field.java:111)
at
org.elasticsearch.hadoop.serialization.dto.mapping.Field.add(Field.java:111)
at
org.elasticsearch.hadoop.serialization.dto.mapping.Field.toLookupMap(Field.java:98)
at
org.elasticsearch.hadoop.serialization.ScrollReader.(ScrollReader.java:61)
at
org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.init(EsInputFormat.java:223)
at
org.elasticsearch.hadoop.mr.EsInputFormat$WritableShardRecordReader.init(EsInputFormat.java:367)
at
org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.(EsInputFormat.java:183)
at
org.elasticsearch.hadoop.mr.EsInputFormat$WritableShardRecordReader.(EsInputFormat.java:359)
at
org.elasticsearch.hadoop.mr.EsInputFormat.getRecordReader(EsInputFormat.java:498)
at
org.elasticsearch.hadoop.mr.EsInputFormat.getRecordReader(EsInputFormat.java:72)
My question:
how can I read back the raw json from es query without ES-Hadoop try to
deserialize it (I want to manual deserialization)?
If I can't do that, ES return an "Object" in this field mapping and the
json contain an empty object "{}". How an I ignore this?
Issue #231 which I believe you have raised, has been fixed in 2.x - can you please try the latest 2.0.1.BUILD-SNAPSHOT
and report back?
Thanks!
On 7/15/14 9:32 AM, János Háber wrote:
Hi guys,
I writing a spark application where I want to use ES with Hadoop. I have a lot of document in ES now I want to aggregate
but I can't.
My document's have different fields which means some have "twitter" field, with values, some have "facebook" etc
When I try to read the data from ES I got an exception:
java.lang.NullPointerException
at org.elasticsearch.hadoop.serialization.dto.mapping.Field.add(Field.java:110)
at org.elasticsearch.hadoop.serialization.dto.mapping.Field.add(Field.java:111)
at org.elasticsearch.hadoop.serialization.dto.mapping.Field.add(Field.java:111)
at org.elasticsearch.hadoop.serialization.dto.mapping.Field.add(Field.java:111)
at org.elasticsearch.hadoop.serialization.dto.mapping.Field.toLookupMap(Field.java:98)
at org.elasticsearch.hadoop.serialization.ScrollReader.(ScrollReader.java:61)
at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.init(EsInputFormat.java:223)
at org.elasticsearch.hadoop.mr.EsInputFormat$WritableShardRecordReader.init(EsInputFormat.java:367)
at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.(EsInputFormat.java:183)
at org.elasticsearch.hadoop.mr.EsInputFormat$WritableShardRecordReader.(EsInputFormat.java:359)
at org.elasticsearch.hadoop.mr.EsInputFormat.getRecordReader(EsInputFormat.java:498)
at org.elasticsearch.hadoop.mr.EsInputFormat.getRecordReader(EsInputFormat.java:72)
My question:
how can I read back the raw json from es query without ES-Hadoop try to deserialize it (I want to manual deserialization)?
If I can't do that, ES return an "Object" in this field mapping and the json contain an empty object "{}". How an I
ignore this?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.