How to index HDFS data


I'm prototyping the use of ES and Hadoop for a project but I cannot figure out the most obvious.

I have a hadoop cluster that contains some log data on HDFS. I installed ES Yarn according to this guide: Elasticsearch seems to work properly, however it stores all data locally. This was mentioned as the default storage solution in the guide, so ok.

Question one: ES created an index called "Titan" during the installation. What is this? Looking at the content is has nothing to do with any data I have put into HDFS.

Question two: What is the proper way to read the HDFS data into en ES index? I feel really stupid, but besides writting an application that pushes it through REST I could not figure this out. Is there any out-of-the-box support for populating ES Indexes?


That's not possible, names of indexes must be lowercase. Maybe you meant titan? Still, that's on your end. Perhaps you sent a post request to /titan and have auto index creation enabled?

You can use the elasticsearch-hadoop framework.