I'm currently working on upgrading an application to Elasticsearch 2.1.0 and ran into some problems regarding loading Elasticsearch plugins from the classpath (instead of loading them from the plugins directory).
In Elasticsearch 1.x and 2.0.x it has been possible to load Elasticsearch plugins from the classpath into an embedded Node, but this feature has been removed in Elasticsearch 2.1.0 (see #13055) and it's now only possible to do this for TransportClient via the TransportClient.Builder#addPlugin() method.
Someone filed an issue about it in #13212 and I created a pull request to (re-) introduce a similar method to NodeBuilder but it was closed (see #15107).
I'm wondering now how I could add a classpath plugin to an embedded Elasticsearch node. For the time being I "cheated" a little bit by creating a sub-class of Node which exposes the constructor which allows passing a collection of plugin classes to be loaded on startup, but this feels kind of dirty.
The reason I can't simply use a TransportClient in my application is, that I need realtime information about the cluster state, e. g. when nodes were added/removed, or indices were created/closed/deleted, for which I've created a ClusterStateListener handling those Elasticsearch events and propagating them via an internal EventBus to my application. As far as I see, using a TransportClient I would need to actively poll for that information in a certain interval, which would of course also increase load on the Elasticsearch cluster. Additionally, using a client node should be more efficient when indexing large amounts of documents because it knows about the cluster topology (which could be done with the TransportClient too, when "sniff" mode is activated – but also only with a delay).
The question is: Is there another way to load plugins into an embedded Elasticsearch node from the classpath except for the "dirty" way I described above?
I'm unsure if this would help, but here is what I'm doing in tests.
Create a MockNode class:
public class MockNode extends Node {
// these are kept here so a copy of this MockNode can be created, since Node does not store them
private Version version;
private Collection<Class<? extends Plugin>> plugins;
public MockNode(Settings settings, Version version, Collection<Class<? extends Plugin>> classpathPlugins) {
super(settings, version, classpathPlugins);
this.version = version;
this.plugins = classpathPlugins;
}
public Collection<Class<? extends Plugin>> getPlugins() {
return plugins;
}
public Version getVersion() {
return version;
}
}
Then starts a Node this way:
public static void main(String[] args) throws Throwable {
Settings.Builder settings = Settings.builder().build();
final CountDownLatch latch = new CountDownLatch(1);
final Node node = new MockNode(settings.build(), Version.CURRENT, Collections.singletonList(MyClassPlugin.class));
Runtime.getRuntime().addShutdownHook(new Thread() {
@Override
public void run() {
node.close();
latch.countDown();
}
});
node.start();
latch.await();
}
For the time being I "cheated" a little bit by creating a sub-class of Node which exposes the constructor which allows passing a collection of plugin classes to be loaded on startup, but this feels kind of dirty.
The question is, if this is the intended way to do it.
I agree NodeBuilder and embedding plugins by classpath is a very handy feature, not just because it was one of the reasons why I find ES plugin writing so attractive back in 2010.
If plugins are really no longer allowed on the classpath, the ES team should remove either the shipwrecked NodeBuilder class completely from the public API, which lacks plugins, or even more clearly, the ES team should send a clear signal and inform the community plugin authors to stop creating node-embedded plugins at all. By just making cumbersome API changes without clear reasoning https://github.com/elastic/elasticsearch/issues/15060 this can quickly become very uncomfortable and hard to understand.
I hope 2.2 will bring some relief regarding NodeBuilder and classpath loading but it's not released yet.
There are two aspects to consider here. The first is security. We have made huge efforts to lock Elasticsearch down to reduce the chances that malicious code can exploit the Elasticsearch server. This lockdown is enforced via the plugin mechanism which adds:
JarHell check
Class loader isolation
Privileges limited to the bare minimum by the Java security manager
Obviously your own code is not malicious but, if we add the ability for users to just bypass these protections in production, then they will. Making security opt-in defeats the object of the exercise. Getting security right is hard (just look at how many people suffer from JarHell) so if we offer an easy way out then users will choose the easy option and just bypass security.
The second aspect is testing code in the way that it will be used. Elasticsearch used to allow a million different configurations, most of which were untested and many of which had subtle edge cases. We are trying to reduce these options to a limited set which are backed by real tests and which are maintainable and supportable.
In the same way, your plugins should be tested in the way they will be used in the wild, hence using the test framework instead of a quick shortcut option that bypasses the usual checks. This means installing your plugins in the plugins directory and providing a plugin-descriptor.properties file. Client nodes already require a proper config directory as they are real nodes, not just transport clients. Similarly, if a client node requires plugins, those should go into the plugins directory.
Hi Clinton, thanks for the very clear answer. My use case seems somewhat simpler -and more common- than the use cases mentioned before, but I seem not to be able to find any documentation or help on this.
In production, we install our plugins with the bin/plugin script as we have always done, and the applications connect either via HTTP or using the transport client. With ES1.x we had also set up a few tests in our Java code base that would perform some indexing and search operations over test data, using the production settings and mappings. As we use a custom token filter in production, I obviously need the same token filter to be available to the embedded instance that is started while testing.
With ES1, I only needed to add the plugin JAR to the classpath. Is there no simple way to do this with ES2? How can I provide the plugin to the embedded node that, at the moment, is started like this?
Same issue here. Trying to load a local Node to run several automated tests used to work on previous versions by including the plugin jars in the classpath. How are we meant to load these on the configured plugins directory now?
Could we retain all the benefits of current approach but support pluggable discovery mechanism maybe based on say URLs for loading plugins (ex: if it is in WAR file, modules and plugins virtual directories could be navigated via URLs) or JAR per plugin with plugin-descriptor.properties even though it won't be loaded by a separate class loader and security probably won't apply in this case but the rest of benefits will...
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.