Mapping a nested field in Hive table properties


(Gowtham Sadasivam) #1

Hello,

I'm pushing my documents in HDFS to ElasticSearch using Hive, (ES-Hadoop connector). My HDFS has documents in the following structure,

{
    "app":{
        "id":"5320614578",
        "name":"WhatsApp"
    },
    "clientid":"XXXXXX-888E-XXXXXXX-F7E1XXXX900RTY",
    "anotherid":"14378452369947",
    "istdt":"2016-09-21 00:10:34",
    "anothervalue":{
        "goo":"This is a string",
        "foo":"Foo foo FoO FOo FOO"
    },
    "cize":"72E90v",
    "devid":"C7E1R7X1R0G5b"
}

I would like to map the field "app.name" to a new field "application" using Hive table properties. The documentation page here (https://www.elastic.co/guide/en/elasticsearch/hadoop/current/hive.html#hive-alias) shows mapping a simple field, But I couldn't find a way to map a nested field like "app" shown in the above document structure. I tried the following methods but none of them worked.

TBLPROPERTIES (
    'es.nodes'='xx.xxx.xxx.xx',
    'es.mapping.names' = 'date:@timestamp, app.name:application',
    'es.batch.write.retry.count' = '-1',
    'es.batch.size.bytes' = '10mb',
    'es.batch.size.entries' = '10000',
    'es.resource'='{logname}-{date:YYYY.MM.dd}/{logtype}'
);

^ In the above syntax I tried to map the field "app.name" to a new field "application". But this doesn't work.

TBLPROPERTIES (
    'es.nodes'='xx.xxx.xxx.xx',
    'es.mapping.names' = 'date:@timestamp, app[name]:application',
    'es.batch.write.retry.count' = '-1',
    'es.batch.size.bytes' = '10mb',
    'es.batch.size.entries' = '10000',
    'es.resource'='{logname}-{date:YYYY.MM.dd}/{logtype}'
);

^ In the above syntax I tried to map the field "app[name]" to the new field "application" using Hive's way of specifying nested fields with "[ ]" square brackets, But this never works too.

This being said, the simple mapping from the field "date" to "@timestamp" works well, but not the field "app.name" or any nested field for that case.

I would like to know, is there a way/syntax that I can map/access the nested values to a new field in Hive.

Thanks in advance.


(system) #2