I am looking to implement the ELK stack with an existing hadoop cluster.
My main goal is to store the logs in HDFS , use ES for just indexing and show the analytics on Kibana.
I have couple of questions regarding this approach
1.) The es-hadoop connector basically gets the data from HDFS and indexes the data in ES, thereby duplicating the data. The query to ES is not redirected to Hadoop. Am I correct?
2.) The indexes are built every time we run the MR job. How can the data be index on "real-time" as soon as it saves in HDFS?
PS: Our environment doesn't use Storm or Spark streaming
It is correct that Elasticsearch stores data on its own and do not redirect to Hadoop. If you want to index into Elasticsearch using MapReduce, there is going to be a delay. A common approach when near real-time access to the logs is required is to feed the logs into Hadoop and Elasticsearch in parallel instead of relying on the logs first being loaded into Hadoop.
In order for you to be able to search through Elasticsearch, the data must be stored in the Elasticsearch indices. The ES-Hadoop connector allows transfer of data between Hadoop and Elasticsearch, but Elasticsearch does not directly access Hadoop. If you require near real-time access to a subset of your data the best way is likely to feed it to Elasticsearch at the same time it is fed to Hadoop. Hadoop will still hold all data and be your primary data store.
If this is not possible and you need to write the data to Hadoop first, it is likely there will be a delay. How long this is depends on how you do the indexing, and using MapReduce jobs can as you initially pointed out be slow.
My Hadoop cluster is in a secured Firewall zone and kerberized. The ES cluster is in a different zone. If I want to use the es-hadoop, how to configure the es-hadoop to use a particular port ?
Can the es-hadoop transfer data from a kerberized hadoop cluster to an ES cluster?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.