Hi,
Truly appreciate any help to resolve the below issue I am facing.
Issue:
Hive support for ES for nested field(JSON string) with other fields.
How to represent the nested field in HIVE where the structure in unknown initially.
I am using elastic search hive bridge to insert data in ES (jar version elasticsearch-hadoop-hive-2.1.2-2.1.2.jar)
The indexed data has many fields and one of those fields is a JSON string.Need is to index this field(JSON string) as a nested field.
so that we can search and filter though the nested JSON document using ES capabilities.
Also another challenge is that this JSON filed is kind of dynamic.It can have new key and values added to it for each row.the names of the fields are not fixed)
eg: ES should have the fields (attr,field1,field2) where attr should be a nested field (json document) which can have dynamic key and value.
sample values from hive table
select attr,field1,field2 from table;
row1 {"A":"GF","M":"vf"},"field1VAL","field2VAL"
row2 {"F":"BD"},"field1VAL","field2VAL"
row3 {"A":"BD","H":"ki","R":"we","M":"mg"},"field1VAL","field2VAL"
IN ES documentation it is specified that es.input.json parameter can be specified as true when the output table has a single field.
but in my case there are other fields too which are present in the hive table and which has to go to ES index.
(only one of the fields (attr) has to be inserted as nested JSON rest are string fields)
When we define the mapping and insert records with curl commands everything is fine.
But through HIVE insert is failing since the mapping is defined for the JSON string field as "nested"
and in HIVE side it is defined as string.Do I need to use struct?How do I define struct in HIVE side (the structure of JSON is not known while mapping is created)
How do we manage it since it can be dynamic as explained above and field names are unknown while mapping creation
step1> INDEX Creation with mappings
crul command is used to create the index mapping first and then the below hive table is mapped to the index and
insert command to insert data to the hive ES table.
"mappings": {
"index_type": {
"dynamic_templates": [{
"string_templates": {
"mapping": {
"index": not_ananlyzed","type":"string"},path_match": "attr.*"
}
}],
"properties": {
"attr": {
"type": "nested"
},
"field1": {
"type": "string",
"index": "not_analyzed"
},
"field2": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
step2> defining HIVE table pointing to the created index
HIVE
create table ES_TABLE (attr string ,field1 string,field2 string) row format delimited
stored by 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.mapping.names'='attr:attr,field1:field1,field2:field2',
'es.index.auto.creation'='false',
'es.resource'..'es.nodes'..'es.port'..)
step3> insert data to HIVE table which is pointing to the created index
Insert overwrite table ES_TABLE SELECT * from MAIN_TABLE;
since the mapping says "attr" is nested and in HIVE ES table I have string I get ERROR
ERROR
Hive Runtime Error while closing operation:Found unrecoverable error
MapperParsingException: object mapping for [index_type] tried to parse field[attr]as object,
** but got EOF, has a concrete value been provided to it?Bailing out**
cause by :org.elasticsearch.hadoop.rest.EsHadoopInvalidRequest:Found unrecoverable error
Question
1)Is there any way i can index all the 3 fields where the json string field can be loaded as nested field
through hive?
2)if not is it possible with hive is it possible with Mapreduce?