How to query hdfs data from elasticsearch using es-hadoop.?

I have data in hdfs and how can i retried the hdfs data and perform realtime queries on hdfs data.?

@magnusbaeck
Could you please help me out.?

@zachary_tong
Could you please help me out.?

This will be very import for our esteemed client.

We want to build solution using elastic search with hdfs as data repository.

Please avoid addressing specific individuals when posting. In order to make your data searchable in Elasticsearch, it needs to be indexed into Elasticsearch. When dealing with Hadoop data, this is commonly achieved through the Elasticsearch Hadoop connector.

Hi Christian,
I understand that we need to use elastic search hadoop connector. There is a way posted in github about hdfs-repository.

Then I am not sure I understand your question. Could you please clarify or be more specific?

Thanks @Christian_Dahlqvist

Thanks for giving me chance to explain the problem.

We want to propose elasticsearch to our client for one of the requirement. We want to keep HDFS as a data repository. Every raw data gets moved to ES.

Once the data has been moved to HDFS. We want to query hdfs data from elastic search. Splunk has a way to do it using virtual indexes. There is part of elastic search-hadoop solution (repository-hdfs), which speaks the same terms. You can create hdfs repository

Here is the detail:

I tried it and added repository in elasticsearch.yml. However, i dont understand the way to query the data. How could i access the data from hdfs in elasticsearch through query or through kibana.?

The HDFS repository allows you to create snapshots (backups) located on HDFS and restore them from HDFS, so it does not do what you seem to be expecting. All data that is to be actively queried however need to reside in the Elasticsearch cluster.

@sdaruna I have the same requirement with you. Have you found any solution?