How do you explicitly set the _ID field when indexing via HIVE using the "es.input.json" option? in 1.7 we were able to use the PATH command to extract the ID from the corresponding JSON record. However, now I'm unsure how to use data from the JSON for the ID.... or if you can. I tried cheating and giving the mapping command a json path (a la the RESOURCE option for document type), but no joy.
I tried the es.mapping.id originally, but it expects a HIVE field to exist (hence the map). I was trying to maintain the JSON only setting while pulling an ID from the JSON string. In reality, it doesn't really matter. Having ES automatically create the _ID field is okay, I'll just need to go into the JSON for any queries on the real ID. Still, it would be nice to be able to push this up using the hadoop libraries.
Looks like I made a mistake. I tried to set the es.mappind.id like the es.resource with {} around the field name. Changing it to just the field_name (in the JSON) worked. I didn't find anything in the documentation that mentioned format.
e.g. for a JSON string that looked like:
{"id":"xxxx", "item": {"key1": 1, "key2",2}}
...
STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'test/doc',
'es.nodes'='my.node',
'es.mapping.id' = 'id',
'es.input.json' ='yes');
Well, what made you think you should be using {. The connector can extract from both JSON or library types hence why only the actual field name is relevant.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.