Issue with Hive mapping of ES nested fields

(Yari Marchetti) #1

I'm having some issue with mapping to an Hive field to a ES nested field. On ES I have a document like:

"name": "test1",
"custom_data": {
"session_id": "d41442b987b5bc8103000a2cc2cfb062",

In Hive I'm trying to map it with:

CREATE EXTERNAL TABLE test_es (sessionid string, name string)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'test/event',
'' = 'false',
'es.nodes' = 'localhost',
'es.mapping.names' = 'sessionid:custom_data.session_id, name:name'

but when I query with a simple:

SELECT * FROM test_es

I keep on getting this error:

Error: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing writable {custom_data=[]}

I'm running Hive 1.1.0 and ES-Hadoop 2.2.0-rc1. Do you have any idea?


(Yari Marchetti) #2

In the end I found the issue: it looks like it was due to the top level structure being unavailable, custom_data in this example, which it's triggering the error (I tried setting '' to true but no benefit). Is there any way to prevent this from happening?

(Costin Leau) #3

If the top level field is not available, so is the nested structure underneath it. Potentially some checks can be added to try and mock the missing field - a github issue would be great in this case.

(system) #4