Elasticsearch on yarn : Node affinity


(Saibal Patra) #1

Current documentation states:

Currently, Elasticsearch on YARN does not provide any option for tying Elasticsearch nodes to specify YARN nodes however this will be addressed in the future. In practice this means that unless YARN is specifically configured, there are no guarantees on its topology between restarts, that is on what machines Elasticsearch nodes will run each time.

I have a 30 data node Hadoop Cluster. If I start elasticsearch in all 30 nodes, after restart I will get back the data, so I can point kibana to any one of the nodes and it will work seamlessly.
But if I select to start in only 5 nodes there is no guarantee which 5 nodes it will restart and so kibana has to be recongured every time.

How do I configure YARN to solve this problem? Thanks!


(Costin Leau) #2

You can't, at least not in the current YARN form. This is one of the reasons why the YARN support is still in Beta; however YARN is being updated towards supporting long-running services / server and this feature is on the roadmap.
No idea though when and if it will happen.


(Nww Pot Fung Nng) #3

If one of the 30 nodes is restarted or a new node is added, do we need to do anything to start the Elasticsearch on that particular node?

Could you please help? Thanks.


(system) #4