I am looking to implement the ELK stack with an existing hadoop cluster.
My main goal is to store the logs in HDFS , use ES for just indexing and show the analytics on Kibana.
I have couple of questions regarding this approach
1.) The es-hadoop connector basically gets the data from HDFS and indexes the data in ES, thereby duplicating the data. The query to ES is not redirected to Hadoop. Am I correct?
2.) The indexes are built every time we run the MR job. How can the data be index on "real-time" as soon as it saves in HDFS?
PS: Our environment doesn't use Storm or Spark streaming