I have a HIVE table(not external) called "default.reading" and it contains 2 columns called "name"(String) and "favoritebook"(Array of String) with the following data:
name | favoritebook
Tom | ["Book1" , "Book2"]
Abby | ["Book3" , "Book4"]
Note that there is no ID in table and the 2 columns are all we have. I am trying to insert above into elasticsearch with this PIG script
-------------------------Script begins------------------------------------------------
SET hive.metastore.uris 'thrift://node:9000';
REGISTER hdfs://node:9001/library/elasticsearch-hadoop-5.0.0.jar;
DEFINE HCatLoader org.apache.hive.hcatalog.pig.HCatLoader();
DEFINE EsStore org.elasticsearch.hadoop.pig.EsStorage(
'es.nodes = elasticsearch.service.consul',
'es.port = 9200',
'es.write.operation = upsert',
'es.mapping.pig.tuple.use.field.names=true'
);
hivetable = LOAD 'default.reading' USING HCatLoader();
hivetable_flat = FOREACH hivetable GENERATE name AS name, favoritebook as favoriteBook;
STORE hivetable_flat INTO 'readings/reading' USING EsStore();
-------------------------Script Ends------------------------------------------------
When running above, i got an error saying: "Error: org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Operation [update] requires an id but none was given/found"
**note that I've chosen to use upsert is because we can then incrementally load more data to it afterwards
- Can you see if there's any issue with my pig script? am i doing anything wrong?
- If adding 'es.mapping.id' is a must for above, what should the value be as there's no ID in the HIVE table?
- favoritebook is an Array of string, i think i have seen below error so i wonder how i can express it in the script?
ERROR 2999: Unexpected internal error. Found unrecoverable error [ip:port] returned Bad Request(400) - failed to parse [favoriteBook]; Bailing out..
Thank you!