Using repository-hdfs on (different) YARN cluster

Hey there, question!

Given

  • i have an ES cluster running on top of YARN via elasticsearch-yarn
  • the ES cluster has the plugin repository-hdfs installed

What is the best way to ensure the plugin's hadoop-libs are matching those on the cluster ?

Checkout https://github.com/elastic/elasticsearch-hadoop/tree/master/repository-hdfs and build the plugin with the correct flavour/hadoop-version (like described in Issues with using repository-hdfs plug in for snapshot/restore operation)?

Or is there anyway to tell the plugin that it makes use of the YARN/Hadoop-Classpath instead of its own plugin-lib folder ? (that would be preferred because i want to run ES on various different Hadoop distributions!)

Any input appreciated!

Moving forward the plugin will work with Hadoop libraries that are embedded/known in the lib folder. And the reason for this being security - delegating to the classpath means we don't know the code source, that is what libraries are used as oppose to the plugin / embedded approach where their location is clearly known.

So while it is a bit more work, selecting the jars and putting them in the plugin (symlink or mounted path can alleviate this problem) folder is the solution regardless of the Hadoop flavor/platform.

1 Like