ElasticSearch+Hadoop+Spark

Janos_Haber · July 15, 2014, 6:32am

Hi guys,

I writing a spark application where I want to use ES with Hadoop. I have a
lot of document in ES now I want to aggregate but I can't.
My document's have different fields which means some have "twitter" field,
with values, some have "facebook" etc

When I try to read the data from ES I got an exception:
java.lang.NullPointerException
at
org.elasticsearch.hadoop.serialization.dto.mapping.Field.add(Field.java:110)
at
org.elasticsearch.hadoop.serialization.dto.mapping.Field.add(Field.java:111)
at
org.elasticsearch.hadoop.serialization.dto.mapping.Field.add(Field.java:111)
at
org.elasticsearch.hadoop.serialization.dto.mapping.Field.add(Field.java:111)
at
org.elasticsearch.hadoop.serialization.dto.mapping.Field.toLookupMap(Field.java:98)
at
org.elasticsearch.hadoop.serialization.ScrollReader.(ScrollReader.java:61)
at
org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.init(EsInputFormat.java:223)
at
org.elasticsearch.hadoop.mr.EsInputFormat$WritableShardRecordReader.init(EsInputFormat.java:367)
at
org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.(EsInputFormat.java:183)
at
org.elasticsearch.hadoop.mr.EsInputFormat$WritableShardRecordReader.(EsInputFormat.java:359)
at
org.elasticsearch.hadoop.mr.EsInputFormat.getRecordReader(EsInputFormat.java:498)
at
org.elasticsearch.hadoop.mr.EsInputFormat.getRecordReader(EsInputFormat.java:72)

My question:

how can I read back the raw json from es query without ES-Hadoop try to
deserialize it (I want to manual deserialization)?
If I can't do that, ES return an "Object" in this field mapping and the
json contain an empty object "{}". How an I ignore this?

Thanks

b0c1

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/6b31f532-1798-44f8-913a-0b56dbe2d2dd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

costin · July 15, 2014, 1:57pm

Hi,

Issue #231 which I believe you have raised, has been fixed in 2.x - can you please try the latest 2.0.1.BUILD-SNAPSHOT
and report back?

Thanks!

On 7/15/14 9:32 AM, János Háber wrote:

Hi guys,

I writing a spark application where I want to use ES with Hadoop. I have a lot of document in ES now I want to aggregate
but I can't.
My document's have different fields which means some have "twitter" field, with values, some have "facebook" etc

When I try to read the data from ES I got an exception:
java.lang.NullPointerException
at org.elasticsearch.hadoop.serialization.dto.mapping.Field.add(Field.java:110)
at org.elasticsearch.hadoop.serialization.dto.mapping.Field.add(Field.java:111)
at org.elasticsearch.hadoop.serialization.dto.mapping.Field.add(Field.java:111)
at org.elasticsearch.hadoop.serialization.dto.mapping.Field.add(Field.java:111)
at org.elasticsearch.hadoop.serialization.dto.mapping.Field.toLookupMap(Field.java:98)
at org.elasticsearch.hadoop.serialization.ScrollReader.(ScrollReader.java:61)
at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.init(EsInputFormat.java:223)
at org.elasticsearch.hadoop.mr.EsInputFormat$WritableShardRecordReader.init(EsInputFormat.java:367)
at org.elasticsearch.hadoop.mr.EsInputFormat$ShardRecordReader.(EsInputFormat.java:183)
at org.elasticsearch.hadoop.mr.EsInputFormat$WritableShardRecordReader.(EsInputFormat.java:359)
at org.elasticsearch.hadoop.mr.EsInputFormat.getRecordReader(EsInputFormat.java:498)
at org.elasticsearch.hadoop.mr.EsInputFormat.getRecordReader(EsInputFormat.java:72)

My question:

how can I read back the raw json from es query without ES-Hadoop try to deserialize it (I want to manual deserialization)?

If I can't do that, ES return an "Object" in this field mapping and the json contain an empty object "{}". How an I
ignore this?

Thanks

b0c1

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
elasticsearch+unsubscribe@googlegroups.com mailto:elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/6b31f532-1798-44f8-913a-0b56dbe2d2dd%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/6b31f532-1798-44f8-913a-0b56dbe2d2dd%40googlegroups.com?utm_medium=email&utm_source=footer.
For more options, visit https://groups.google.com/d/optout.

--
Costin

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/53C53347.6050808%40gmail.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
NullPointerException when settings "es.read.field.as.array.include" options Elasticsearch es-hadoop	7	1659	July 6, 2017
Elastic Search Hadoop Connector - Spark Facing Issues while Saving to ES Elasticsearch es-hadoop	4	1823	July 6, 2017
Handling array values while reading from elasticsearch in spark using elasticsearch-spark Elasticsearch es-hadoop	1	938	November 19, 2020
Serialization issue on arrays Elasticsearch es-hadoop	9	2878	July 6, 2017
Value got nulled when ingesting to ES from Hadoop using Spark Elasticsearch es-hadoop	1	393	January 6, 2021

ElasticSearch+Hadoop+Spark

Related topics