Nested arrays in hive from MongoDB

Hi,

I am really hoping there is a genius in here that can help me, I've been stuck on this for 3 days now.

I have a MongoDB collection, sample of a document is:

{
"_id" : "58c879787f4fd6a4982f3039",
"Actions" : [
{
"What" : {
"_t" : [
"TargetBase",
"UserTarget",
"CreatedTarget"
],
"Who" : {
"TontoUserId" : 12345,
"EmailAddress" : "someuser@test.com.au",
"Name" : "some-user"
}
},
"When" : ISODate("2017-03-14T23:49:06.585Z"),
"Reason" : "some text",
"Wrapup" : null
}
]
}

I have created an external table in hive:

create external table if not exists x (my_serial string, actions array<struct<what:struct<_t:array, who:struct<tontouserid:string, emailaddress:string, name:string>>, when:string, reason:string, wrapup:string>>)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"my_serial":"_id", "actions":"Actions"}')
TBLPROPERTIES('mongo.uri'='mongodb://testuser:testpwd@test-mongo.test.com.au:27017/somedb.somecollection?authSource=admin');

I don't get any errors when running the above, but when I query the table the only data that I get is the my_serial column:

select * from x limit 1;

58c879787f4fd6a4982f3039 [{"what":null,"when":null,"reason":null,"wrapup":null}]

Similar story when I try to explode the actions array:

select my_serial, actionstuff.what._t
from x
LATERAL VIEW explode(actions) actionstable as actionstuff
limit 1;

If I try a different tack on creating the external table:

create external table if not exists x (my_serial string, actions array)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"my_serial":"_id", "actions":"Actions"}')
TBLPROPERTIES('mongo.uri'='mongodb://testuser:testpwd@test-mongo.test.com.au:27017/somedb.somecollection?authSource=admin');

I get data, but I can't do anything with it:

58c879787f4fd6a4982f3039 ["{ "What" : { "_t" : [ "TargetBase" , "UserTarget" , "CreatedTarget"] , "Who" : { "TontoUserId" : 12345 , "EmailAddress" : "someuser@test.com.au" , "Name" : "some-user"}} , "When" : { "$date" : "2017-03-14T23:15:04.047Z"} , "Reason" : "some text" , "Wrapup" : null }"]

Anyone able to tell me what I'm doing wrong?

Much appreciated.

Hi again,

I didn't get a reply on any of the sites that I posted my question on, but I have worked out a clunky solution that I wish to share in case
anyone else is looking to do something similar.

-- create the external table to the mongo store
create external table if not exists x (my_serial string, actions array)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"my_serial":"_id", "actions":"Actions"}')
TBLPROPERTIES('mongo.uri'='mongodb://testuser:testpwd@test-mongo.test.com.au:27017/somedb.somecollection?authSource=admin');

-- create an intermediary table
create table job_actions_exploded (my_serial string, t string, reason string, when_date timestamp, who_tonto_user_id string);

-- insert into the intermediary table
insert into job_actions_exploded
(my_serial, t, reason, when_date, who_tonto_user_id)
select my_serial
, reverse(split(reverse(regexp_replace(regexp_replace(regexp_replace(get_json_object(single_json_table.single_json, '$.What._t'), '\["', ''), '","', ','), '"\]', '')), ',')[0]) as t
, get_json_object(single_json_table.single_json, '$.Reason') as reason
, from_utc_timestamp(regexp_replace(regexp_replace(regexp_replace(get_json_object(single_json_table.single_json, '$.When'), '\{"\$date"\:"', ''), 'Z"\}', ''), 'T', ' '), 'Australia/Brisbane') as when_date
, get_json_object(single_json_table.single_json, '$.What.Who.TontoUserId') as who_tonto_id
from x
lateral view explode(actions) single_json_table as single_json;

The key to this is the lateral view explode to create single json strings which can then be inspected using the get_json_object function.
The get_json_object is case sensitive when supplying the '$.Column' name.

It's worth noting that I only needed the last value out of the 'What._t' array, hence the reverse,split,reverse. This could also have been achieved
using the regexp_extract function.

I also needed the UTC date/time that is stored in mongo converted to Brisbane time (Australian Eastern Standard Time, AEST).

I hope this is useful for someone else out there.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.