Best practices for Integration of ES-Hadoop to CDH5

Twash · July 22, 2015, 11:00pm

Hi All,

I have a small CDH5 cluster - 1, manager, 2 name nodes and 4 data nodes all are stand alone servers. What I'd like to do is to add ES-Hadoop, Kibana and FluentD to the cluster. Should ES-Hadoop be installed on all nodes or is it sufficient to install it on a single node?

Regards
Tw

costin · July 25, 2015, 11:29am

It depends on how beefy your hardware is and what are your requirements. ES is quite flexible so that it can support different topologies:

you can install it on separate hardware (aka HW); the pros here is that it does not interfere with the HW for Hadoop and it (potentially) has dedicated HW itself (depends on how you configure HW). The downside is that data has to go across the network; depending on your setup this might not be an issue or it might (1Gbps is quite potent and many times even faster than traditional HDD).
you can install it next to the CDH nodes. You gain data locality however the HW will be shared by both CDH and ES. Both rely on memory so you would need plenty of it; however they are both IO dependent so you likely want different partitions for each of them.

You can also use a mixed scenario - some nodes of ES are sitting next to CDH, some on different HW. All scenarios are valid - the difference is in performance; the more dedicated HW, ES or Hadoop uses the better they performed; using shared HW has a better cost however in terms of IO things might not work so well (simply because there's going to be some contention and things like page cache will be thrashed faster).
Depending on your scenario, this might be important or not - it depends on too many factors and doing some basic benchmarks clarifies the situation.

Considering you are starting with a small cluster, I would allocate 1 node with enough memory in your cluster and take it from there; you can easily start another one (and ES will automatically take advantage of it with default settings - namely 5 shards per index) if needs be.

Topic		Replies	Views
Is Hadoop the right solution for distributed computation with ES? Elasticsearch es-hadoop	1	610	April 30, 2019
Have a couple of questions on ES Elasticsearch	2	338	July 6, 2017
Best practices in ES cluster tuning Elasticsearch es-hadoop	4	1040	July 6, 2017
[Hadoop] Running Hortonworks and ES on the same shared cluster -- How to and considerations Elasticsearch	2	323	July 6, 2017
Design advice for ES side-by-side with hadoop cluster? Elasticsearch	2	378	July 6, 2017

Best practices for Integration of ES-Hadoop to CDH5

Related Topics