JSON array mapping into Hive


(Krishna Chaitanya) #1

I would like to read elasticsearch data into Hive, and have been following documentation here.

One of my elasticsearch fields (ACTIVITIES) is an array of JSON objects with structure like this:

ACTIVITIES:{
   {NAME: "..."
     ID: "..."
     TIME: "..."
   },
   {NAME: "..."
     ID: "..."
     TIME: "..."
   },
 .....
}

Here is the mapping of that field:

         "ACTIVITIES" : {
            "properties" : {
              "NAME" : {
                "type" : "text",
                "norms" : false,
                "fields" : {
                  "keyword" : {
                    "type" : "keyword"
                  }
                }
              },
              "ID" : {
                "type" : "text",
                "norms" : false,
                "fields" : {
                  "keyword" : {
                    "type" : "keyword"
                  }
                }
              },
              "TIME" : {
                "type" : "text",
                "norms" : false,
                "fields" : {
                  "keyword" : {
                    "type" : "keyword"
                  }
                }
              }
            }
          },

I want to map this into Hive column(s). I tried to create the column as an array of struct array<struct<name:string,id:string,time:string>>
and give mapping as
'es.mapping.names' : 'activities.name:ACTIVITIES.NAME, activities.id:ACTIVITIES.ID, activities.time:ACTIVITIES.TIME '

All other fields from ES are read correctly into Hive except this JSON array, which is read as NULL. I dont know right way to do, because I couldn't find type conversion for JSON array like this in documentation.

Please help


(Krishna Chaitanya) #2

Well, I do not know if this is the right way, but this seems to have worked.

CREATE EXTERNAL TABLE es-hive (column1 string, ..., activities array<struct<name:string,id:string, time:string>>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'es-index-name/type-name',
                ....
               'es.query' = '?q=*',
               'es.output.json' = 'true',
               'es.mapping.names' = 'column1:some-es-field, ...., activities:ACTIVITIES');

(system) #3

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.