Export elasticsearch to a JSON file with Pig

(Thomas Decaux) #1

Using the elasticsearch-hadoop library, I want to export elasticsearch index to JSON so I tried:

A = LOAD 'events-sample/events'
USING org.elasticsearch.hadoop.pig.EsStorage(

STORE A INTO '/user/admin/toto.json' USING JsonStorage();

But JsonStorage complains about a missing schema. So I tried many schema:

... AS (chararray), ... AS (line:map[chararray]) etc...

Then got the following Java exception from EsStorage.java:

Caused by: java.lang.ClassCastException: java.lang.String cannot be cast to java.util.Map
at org.elasticsearch.hadoop.pig.EsStorage.getNext(EsStorage.java:271)
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader.nextKeyValue(PigRecordReader.java:204)
at org.apache.tez.mapreduce.lib.MRReaderMapReduce.next(MRReaderMapReduce.java:116)
at org.apache.pig.backend.hadoop.executionengine.tez.plan.operator.POSimpleTezLoad.getNextTuple(POSimpleTezLoad.java:110)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:307)
at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNextTuple(POForEach.java:252)
... 17 more

I find the documentation is missing real examples to use elasticsearch on Hadoop for people with good ES skills but novice to Hadoop.

Thanks you,

(James Baiera) #2

Thanks for posting this. I have also run into the same issues that you have described. I've opened an issue in the repository to track this: https://github.com/elastic/elasticsearch-hadoop/issues/871

(Thomas Decaux) #3

Seriously, is there someone who are using elasticsearch for hadoop in production? I am very curious to see real project about this.

Many thanks for the issue.

Read elasticsearch with Cascading
(system) #4