Hello,
I'm pushing my documents in HDFS to ElasticSearch using Hive, (ES-Hadoop connector). My HDFS has documents in the following structure,
{
"app":{
"id":"5320614578",
"name":"WhatsApp"
},
"clientid":"XXXXXX-888E-XXXXXXX-F7E1XXXX900RTY",
"anotherid":"14378452369947",
"istdt":"2016-09-21 00:10:34",
"anothervalue":{
"goo":"This is a string",
"foo":"Foo foo FoO FOo FOO"
},
"cize":"72E90v",
"devid":"C7E1R7X1R0G5b"
}
I would like to map the field "app.name" to a new field "application" using Hive table properties. The documentation page here (https://www.elastic.co/guide/en/elasticsearch/hadoop/current/hive.html#hive-alias) shows mapping a simple field, But I couldn't find a way to map a nested field like "app" shown in the above document structure. I tried the following methods but none of them worked.
TBLPROPERTIES (
'es.nodes'='xx.xxx.xxx.xx',
'es.mapping.names' = 'date:@timestamp, app.name:application',
'es.batch.write.retry.count' = '-1',
'es.batch.size.bytes' = '10mb',
'es.batch.size.entries' = '10000',
'es.resource'='{logname}-{date:YYYY.MM.dd}/{logtype}'
);
^ In the above syntax I tried to map the field "app.name" to a new field "application". But this doesn't work.
TBLPROPERTIES (
'es.nodes'='xx.xxx.xxx.xx',
'es.mapping.names' = 'date:@timestamp, app[name]:application',
'es.batch.write.retry.count' = '-1',
'es.batch.size.bytes' = '10mb',
'es.batch.size.entries' = '10000',
'es.resource'='{logname}-{date:YYYY.MM.dd}/{logtype}'
);
^ In the above syntax I tried to map the field "app[name]" to the new field "application" using Hive's way of specifying nested fields with "[ ]" square brackets, But this never works too.
This being said, the simple mapping from the field "date" to "@timestamp" works well, but not the field "app.name" or any nested field for that case.
I would like to know, is there a way/syntax that I can map/access the nested values to a new field in Hive.
Thanks in advance.