I am wondering about the other obvious configuration: one of shared nodes,
consisting in one cluster for both Hadoop and Elasticsearch. The case is
mentioned in Elasticsearch-Hortonworks' presentation (slides 31-32) :
How would one implement Hadoop and Elasticsearch on the same cluster?
Would there be any trouble between Hadoop ZooKeeper and Elasticsearch's own
cluster management?
No need to repost - if you are looking for real-time replies, try the IRC.
What exactly are you looking for? Elasticsearch and Hadoop are 'services' that can share the same hardware.
Since ES does not depend on Hadoop or vice-versa, you can install each one as you typically do - the more hardware, the
better.
You basically end up with two 'software' clusters on top of a hardware one. The advantage here is that you can mix and
match:
you can choose to allocate a different number of machines to ES vs Hadoop; some machines can be shared while others
aren't - it's your choice.
There's no "implementation" to be done - you just install them one by one and if you want to run your Hadoop jobs
against ES,
use es-hadoop (the documentation contains plenty of examples for each library that you might use from Map/Reduce to Spark).
How would one implement Hadoop and Elasticsearch on the same cluster?
Would there be any trouble between Hadoop ZooKeeper and Elasticsearch's own cluster management?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.