Please avoid addressing specific individuals when posting. In order to make your data searchable in Elasticsearch, it needs to be indexed into Elasticsearch. When dealing with Hadoop data, this is commonly achieved through the Elasticsearch Hadoop connector.
Thanks for giving me chance to explain the problem.
We want to propose elasticsearch to our client for one of the requirement. We want to keep HDFS as a data repository. Every raw data gets moved to ES.
Once the data has been moved to HDFS. We want to query hdfs data from elastic search. Splunk has a way to do it using virtual indexes. There is part of elastic search-hadoop solution (repository-hdfs), which speaks the same terms. You can create hdfs repository
Here is the detail:
I tried it and added repository in elasticsearch.yml. However, i dont understand the way to query the data. How could i access the data from hdfs in elasticsearch through query or through kibana.?
The HDFS repository allows you to create snapshots (backups) located on HDFS and restore them from HDFS, so it does not do what you seem to be expecting. All data that is to be actively queried however need to reside in the Elasticsearch cluster.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.