Getting _id field in elasticsearch to map to a field in HIVE


(hiveesuser) #1

Hi,

I am working on a project to integrate hive and elasticsearch and for one
of our use case we are loading data from ELASTISCSEARCH --> HIVE.

During this process I want to store the _id field which is in elasticsearch
document in hive. I am able to get the fields which are part of _source
like messages, @timestamp etc but I am not able to get the _id associated
with that particular document.

The following is the sample table I am trying to create

create external table eshivetable (id string,eventdate timestamp, host
string, username string, message string) STORED BY
'org.elasticsearch.hadoop.hive.EsStorageHandler'
TBLPROPERTIES('es.resource' = 'logstash-*/syslog', 'es.mapping.names' =
'id:_id,eventdate:@timestamp,host:host,username:username,message:message','es.nodes'='10.10.10.50','es.port'='9200','es.query'='?q=type:syslog');

So when I select id it returns a null value...

Can some one help me with this please.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5d2eb38d-9f0d-4329-ba2b-0d28c06f98e5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Dmytro Popovych) #2

Have the same issue, does anyone know the solution?


(Dmytro Popovych) #3

Found the solution:

create external table some_table (id string)
stored by 'org.elasticsearch.hadoop.hive.EsStorageHandler'
tblproperties (
    'es.resource' = '<index>/<type>',
    'es.nodes' = '<es_host>',
    'es.read.metadata' = 'true',
    'es.mapping.names' = 'id:_metadata._id'
);

(Thomas Decaux) #4

What about:

es.mapping.id

From https://www.elastic.co/guide/en/elasticsearch/hadoop/2.4/configuration.html#cfg-mapping