How to index HDFS data

johan313 · August 28, 2016, 6:02pm

Hello,

I'm prototyping the use of ES and Hadoop for a project but I cannot figure out the most obvious.

I have a hadoop cluster that contains some log data on HDFS. I installed ES Yarn according to this guide: https://www.elastic.co/guide/en/elasticsearch/hadoop/current/ey-usage.html. Elasticsearch seems to work properly, however it stores all data locally. This was mentioned as the default storage solution in the guide, so ok.

Question one: ES created an index called "Titan" during the installation. What is this? Looking at the content is has nothing to do with any data I have put into HDFS.

Question two: What is the proper way to read the HDFS data into en ES index? I feel really stupid, but besides writting an application that pushes it through REST I could not figure this out. Is there any out-of-the-box support for populating ES Indexes?

R
Johan

jasontedor · August 28, 2016, 8:37pm

That's not possible, names of indexes must be lowercase. Maybe you meant titan? Still, that's on your end. Perhaps you sent a post request to /titan and have auto index creation enabled?

You can use the elasticsearch-hadoop framework.

Topic		Replies	Views
Store indexes in ES while the data stays in HDFS Elasticsearch es-hadoop	4	965	July 6, 2017
How should I search data in hdfs Elasticsearch es-hadoop	3	1875	July 6, 2017
Query hdfs from ES Elasticsearch	2	413	July 5, 2017
Have a couple of questions on ES Elasticsearch	2	338	July 6, 2017
Can I store ES indices on HDFS only? Elasticsearch es-hadoop	4	912	July 6, 2017

How to index HDFS data

Related topics