How to push data from Hadoop to ES?

Kulasangar_Gowrisang · June 21, 2017, 11:27am

I've already gone through the guide proposed by ES but then i'm still quite uncertain on how this works.

I'm trying to send data from hadoop to my ES index. Is this possible?

So this is what I have tried so far:

I'm using Hive in order to do this. So as of now, I've simply created an external table from the Hive shell.

CREATE EXTERNAL TABLE eshadoop (id BIGINT, name STRING, time timestamp, url STRING) STORED BY 'org.elasticsearch.hadoop.hive.EsStorageHandler' TBLPROPERTIES('es.resource.write' = 'eshadooptest/eshadoop', 'es.index.auto.create' = 'true', 'es.nodes.wan.only' = 'false', 'es.nodes' = 'localhost');

So what I expect from the above query is to create an index named eshadooptest in my elasticsearch instance with the above mentioned fields. But then it doesn't create the expected index. But then the table gets created and I could still see it in my metastore. I've got a sample log file (an apache log) uploaded in to my hdfs too.

What I wanted to know is, how am I going to push a log (the apache log i mentioned above) or a document which is in the HDFS, to an ES index which I'm creating using Hive or in the ES itself. Do I have to insert the data which is in hdfs to Hive first, and then push it to ES or could I directly do it?

Please do bare with me if I'm on the wrong track, since I'm still a noob. Thanks!

james.baiera · June 21, 2017, 1:51pm

Creating a table in hive does not necessarily allocate and initialize the index in Elasticsearch. Try inserting data into the table you've created. It should show up after that.

An extra bit of advice: since you're writing to Elasticsearch using dates, it might make sense to create the index in Elasticsearch before hand so that your mappings are correct.

Kulasangar_Gowrisang · June 21, 2017, 3:03pm

Thank you so much @james.baiera for the quick response

Yep I'll go with that then. So I'll create the index beforehand with the appropriate mappings for the fields which I'm going to create in the hive table as well.

But my concern is let's say I've got this apache log in my hdfs directory, and I want to insert only the necessary items such as (host, port, log-type, etc.) from the log into my elasticsearch index.

Whereas let's assume that I'm having host, port, log-type etc as my fields in ES plus as columns in my hive table as well. So I assume it's not possible for me to directly push the values for the above columns directly into my ES fields through Hive.

So I should be having let's say a java program which could be a Spark application in order to process the apache log from hdfs and insert only the necessary items into the hive table columns. So there after I'll be able to send the data to my ES index fields?

Would that be the correct way? Thanks again!

james.baiera · June 22, 2017, 7:08pm

If you're already processing the log data in Spark, you can use the ES-Hadoop library to load the data into Elasticsearch at the end of your Spark job. You don't necessarily have to push it to Hive before loading to Elasticsearch.

Kulasangar_Gowrisang · June 23, 2017, 5:56pm

@james.baiera thank you so much.

If you could point me towards a head start, in processing data from hdfs and migrating it to ES using the ES-Hadoop mediator?

Thank you.

james.baiera · June 23, 2017, 5:58pm

@Kulasangar_Gowrisang Our docs are pretty comprehensive of what features we support in the ES-Hadoop connector, but for questions about processing libraries, you're best off checking their respective documentation.

system · July 21, 2017, 5:59pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Pushing data from Hive to Elastic Search Elasticsearch	15	1412	July 6, 2017
Unable to create index and insert data? Elasticsearch es-hadoop	2	1267	March 8, 2019
TO map hive table into ES Elasticsearch es-hadoop	3	4034	July 6, 2017
Collect data from HIVE Elasticsearch es-hadoop	4	1403	August 22, 2018
Insert data into Elasticsearch from Hive in real-time Elasticsearch es-hadoop	4	2262	July 6, 2017

How to push data from Hadoop to ES?

Related topics