Query HDFS data using ES and Kibana

Hi, I would like to setup a Kibana GUI to query some log files in HDFS. So, I'm looking at the Hadoop-ES connector if this can help to meet my requirement. However, there are few doubts I have now.

  1. Do I need to store all these log files into ElasticSearch in order to query using Kibana? Then, I will have double copy of log files in HDFS and ElasticSearch.

  2. How Hadoop-ES connector to connect to HDFS? I refer to the documents, it only mentions about connecting to Hive, Pig, Spark, Storm, etc, but no HDFS.

  3. Where should I install this Hadoop-ES connector? In ES cluster, or Hadoop cluster? I have gone through the installation guide, however, it didn't state the installation steps clearly. Anyone can help?

Thank you.

Regards,
KK

Yes, the data will need to be indexed into Elasticsearch.

It does not connect to HDFS, so you will need to use it to import data into Elasticsearch using a MapReduce job or one of the other methods described.

You use it from Hadoop, but I will need to leave the steps to set it up to someone else.

Since my storage of the logs is HDFS, I'm only able to store hot data in elasticsearch, but cold data still in HDFS. In case user want to query cold data from kibana, is it feasible to export the data from HDFS to ES then present on kibana only during user trigger?

No, not as far as I know. You would need to set up a job that does this, but indexing large amounts of cold data could take a while.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.