Issue with Hive mapping of ES nested fields


(Yari Marchetti) #1

Hello,
I'm having some issue with mapping to an Hive field to a ES nested field. On ES I have a document like:

{
"name": "test1",
"custom_data": {
"session_id": "d41442b987b5bc8103000a2cc2cfb062",
}
}

In Hive I'm trying to map it with:

CREATE EXTERNAL TABLE test_es (sessionid string, name string)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'test/event',
'es.index.auto.create' = 'false',
'es.nodes' = 'localhost',
'es.mapping.names' = 'sessionid:custom_data.session_id, name:name'
)

but when I query with a simple:

SELECT * FROM test_es

I keep on getting this error:

Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {custom_data=[]}

I'm running Hive 1.1.0 and ES-Hadoop 2.2.0-rc1. Do you have any idea?

Thanks,
Yari


(Yari Marchetti) #2

In the end I found the issue: it looks like it was due to the top level structure being unavailable, custom_data in this example, which it's triggering the error (I tried setting 'es.field.read.empty.as.null' to true but no benefit). Is there any way to prevent this from happening?


(Costin Leau) #3

If the top level field is not available, so is the nested structure underneath it. Potentially some checks can be added to try and mock the missing field - a github issue would be great in this case.


(system) #4