Reading json data from ES to HIVE with a single string field


(deepakas) #1

Is it possible to get the complete json from es using the following table structure. I am able to get the data when I define the schema for json data in hive table. But when I use a single field to get the json string I am getting NULL as output. I am able to use the same structure to write json data to ES.

CREATE EXTERNAL TABLE es_test (data STRING)
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'esindex/test',
'es.input.json' = 'yes','es.nodes' = 'host1:9200' ,'es.query' = '?q=*');


(Iris Panabaker) #2

I would recommends to use JSON tools such as http://codebeautify.org/jsonviewer and http://jsonformatter.org to debug , View and validate JSON data.


(deepakas) #3

It is a valid json. I am not sure if it is related to having a keyword like name , start as keys in the json string. Also I have fields with Capital letters in the field name like -> englishName. I am able to pull some of the fields from the beginning of the json when I give all the fields in my hive table.


(Costin Leau) #4

If you are trying to return the docs from ES in JSON format, that is not supported. It shouldn't be hard to use though.
Your table configuration is confusing though - you define both an input (es.input.json) and a query (basically reading from it).
It's recommended to split the two - you'll end up with two different table that point out to the same index sure, but a table with an associated query means the data is filtered.

P.S. Defining a query that returns everything it's not just redundant, it's useless.


(system) #5