Netty transport and shaded jar

Hi,
I'm playing with another netty based transport (for ES-0.20.2). Ideally,
I'd like to inherit from existing netty-transport, and provide different
client/server-boostrap, pipelinefactory and etc.
Since netty jar is shaded "into" ES, I find myself 2 options for using
netty classes:

  1. use the shaded (renamed) class already packaged in ES, but using
    other classes from netty becomes a problem.
  2. use the original netty classes: then I'd have to copy/paste existing
    netty-transport code, and this will become more burden if there are
    transport API changes.
    Is there some suggestion for such situation. It would be great if you
    could share some insight about shading netty into ES.

Thanks!
Jaguar

--

Unfortunately, in most cases, you have to include original netty into your
plugin, and yes, you have to duplicate code and add some quirks to attach
unshaded netty-based classes to the existing shaded netty-based transport
classes. The reason is, shaded netty in ES is minimized, so it does no
longer contain all netty classes, which makes it hardly usable by plugins.
For example, the websocket classes were dropped by ES minimization. See my
websocket netty transport plugin at

It is also true that changes in the transport module are challenging for
transport plugin authors. There's not much we can do about it.

Jörg

--

Just wondering if you could reshade netty to org.elasticsearch.common.*?
That way, missing classes should be available in the classloader.

What do you think?

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 13 janv. 2013 à 03:54, Jörg Prante joergprante@gmail.com a écrit :

Unfortunately, in most cases, you have to include original netty into your plugin, and yes, you have to duplicate code and add some quirks to attach unshaded netty-based classes to the existing shaded netty-based transport classes. The reason is, shaded netty in ES is minimized, so it does no longer contain all netty classes, which makes it hardly usable by plugins. For example, the websocket classes were dropped by ES minimization. See my websocket netty transport plugin at https://github.com/jprante/elasticsearch-transport-websocket

It is also true that changes in the transport module are challenging for transport plugin authors. There's not much we can do about it.

Jörg

--

--

You still have to include a second dependency for netty in the plugin. In
the plugin zip assembly (which is very weird per se since it breaks
dependency resolution) you could shade netty again so you could drop the
unshaded netty jar from the zip (although needed as dependency in pom.xml).
It's not a solution, it's a workaround as if there is no dependency
resolution by the build system.

Why shading plugin code, only because ES uses shading? I am not a friend of
shading jars. Shading is evil. A change of the build of Elasticsearch
without the minimization and shading would be the easiest method to avoid
doubling dependencies and create less burden on the plugin authors. I use
also Guava, Netty, Jackson, Joda etc. everywhere in my code, but I just can
not reuse the ES dependencies.

From my understanding, shading is used only to ensure that ES binary
releases are tied to specific versions of the dependencies. Therefore, ES
disabled DRY, code reuse, and conflict resolution, but on the other hand
offered a plugin API. For Java plugin developers, the plugin zip
distribution creation is a PITA, all auto Maven resolution smartness is
gone, the dependencies must be collected and repeated manually in the zip
assembly, checked manually for conflicts, as if there was no dependency
resolution. It feels just like in the bad old ant days. I have spent hours
of adding boilerplate code and testing my ES plugins only to ensure they
work cleanly without conflicts with the existing ES code or classpath
issues sooner or later. And it's not just maven, it's also
gradle/ivy/whatever that is affected by shading.

Others also say shading is evil, see
https://issues.jboss.org/browse/WELD-935

The shading issue has motivated me for conducting a code modularization
project where the ES dependencies are assigned to submodules and unshaded
jars are the main artifacts (a shaded node jar is just an add-on). I want
to use the reorganized client/server code for RPM packaging efforts. With
shading, RPM packages are harder to create and maintain and therefore
discouraged.
See http://sochotni.fedorapeople.org/fosdem2011-sochotnicky.pdf

Best regards,

Jörg

--

I have explained already why we shade those jars, so kindda repeating the
same discussion while I haven't heard relevant answers to the point I
raised, the main reason we do shading are:

  1. Minor version of libraries sometime have significant changes that we
    rely on in ES. It makes developing much simpler, and ends up benefiting the
    user much more. For example, netty changed the way internal buffers are
    handled between minor versions.
  2. Some libraries we simply "hack" here and there, for the lack of a better
    word. For example, we remove the fact that fields have become volatile in
    Joda DateTime, well, because its the wrong decision that was made by Joda
    (see my discussion on the java concurrency list for background). And, btw,
    it seems like this change is going to be reverted in the next version of
    Joda, but we can't wait for it.
  3. I want to reserve the right to do 1 and 2, even on other libraries
    that we don't "touch" today. For example, the fact that stats were enabled
    on Caches in Guava, and it took me some time to convince them that it was a
    mistake.

So, for all intent and purpose, libraries that are shaded into ES should be
treated as ES codebase because of the above reasons. Its as simple as that.
Once you understand that, then other arguments, like maven dependency
resolution (we don't want it!), RPM (which is just FUD) are pointless. If
you want to have your libs used in plugins, simply don't use the ES ones
(like guava, or netty) and use your own.

If you are missing some classes on the netty end for example, we can check
if we can not optimize those out of netty during shading, I am perfectly ok
with it.

On Sunday, January 13, 2013 12:35:31 PM UTC+1, Jörg Prante wrote:

You still have to include a second dependency for netty in the plugin. In
the plugin zip assembly (which is very weird per se since it breaks
dependency resolution) you could shade netty again so you could drop the
unshaded netty jar from the zip (although needed as dependency in pom.xml).
It's not a solution, it's a workaround as if there is no dependency
resolution by the build system.

Why shading plugin code, only because ES uses shading? I am not a friend
of shading jars. Shading is evil. A change of the build of Elasticsearch
without the minimization and shading would be the easiest method to avoid
doubling dependencies and create less burden on the plugin authors. I use
also Guava, Netty, Jackson, Joda etc. everywhere in my code, but I just can
not reuse the ES dependencies.

From my understanding, shading is used only to ensure that ES binary
releases are tied to specific versions of the dependencies. Therefore, ES
disabled DRY, code reuse, and conflict resolution, but on the other hand
offered a plugin API. For Java plugin developers, the plugin zip
distribution creation is a PITA, all auto Maven resolution smartness is
gone, the dependencies must be collected and repeated manually in the zip
assembly, checked manually for conflicts, as if there was no dependency
resolution. It feels just like in the bad old ant days. I have spent hours
of adding boilerplate code and testing my ES plugins only to ensure they
work cleanly without conflicts with the existing ES code or classpath
issues sooner or later. And it's not just maven, it's also
gradle/ivy/whatever that is affected by shading.

Others also say shading is evil, see
Loading...

The shading issue has motivated me for conducting a code modularization
project where the ES dependencies are assigned to submodules and unshaded
jars are the main artifacts (a shaded node jar is just an add-on). I want
to use the reorganized client/server code for RPM packaging efforts. With
shading, RPM packages are harder to create and maintain and therefore
discouraged. See
http://sochotni.fedorapeople.org/fosdem2011-sochotnicky.pdf

Best regards,

Jörg

--

Hi,
The discussion is interesting, take my original netty as example:). I'm
trying to add an IP address filter for security reason, and netty aleady
provides a good enough implementation. It's a bit hard (but still
possible) to add that as a plugin. The real problem is that unused classes
are removed while shading. It would be fine for this particular case if all
classes from netty are included in ES.
IP filter is a simple case, for more complex one, wrapping the ES
protocol with customised authentication method, more classes would be
missing from the shaded jar.
If shading is consider a better practice, then I would suggest shade
nothing or the whole jar.

Cheers!
Jaguar

--

Sorry if my rant was too harsh, that's really not my intention.

I'm aware that chasing all the subtle bugs and wrong decisions in the
upstream development of the dependencies is very hard work and is most
admirable. To see your suggestions being integrated into upstream is very
delighting and I hope your improvements will continue being picked up.

What makes me wonder is there are other dependencies like Lucene or the
LGPL-licensed JTS for the geo search which are also "hacked"/"enriched" for
use by Elasticsearch, but are provided as unmodified jars.

My point is, there are other methods to override deficient/unwanted code in
dependencies. For example, adding patch jars in front of the classpath,
that can be removed when upstream bugs are fixed. The original dependencies
could be used throughout the build and plugin development, but are not
effective at runtime. For example, OSGi uses "fragments" for this
classpath-based approach (I read about it, I'm not an OSGi expert). Another
example, WAR artifacts which depend on other WAR artifacts often use
"overlays", but not shading.

For creating Elasticsearch RPMs, I mean the Fedora Packaging Guidelines.

https://fedoraproject.org/wiki/Packaging:Java

In short, Fedora policy insists on no bundling, no shading, clean
licensing. Quoting: "Many Java projects re-ship their dependencies in their
own releases. This is unacceptable in Fedora. All packages MUST be built
from source and MUST enumerate their dependencies with Requires."

There has been an effort started to package Elasticsearch for Fedora
recently

https://bugzilla.redhat.com/show_bug.cgi?id=902086

Fedora provides Guava, Netty, Jackson, Joda, Sigar RPMs already, and also
Lucene RPMs. I would really appreciate it if I could help to create
Elasticsearch RPMs which can be accepted by Fedora QA.

Best regards,

Jörg

--

I can its more of a case by case basis, for example, if you want to add ip filter, we can add it to the current transports as a feature.

On Jan 23, 2013, at 9:12 AM, xiong.jaguar xiong.jaguar@gmail.com wrote:

Hi,
The discussion is interesting, take my original netty as example:). I'm trying to add an IP address filter for security reason, and netty aleady provides a good enough implementation. It's a bit hard (but still possible) to add that as a plugin. The real problem is that unused classes are removed while shading. It would be fine for this particular case if all classes from netty are included in ES.
IP filter is a simple case, for more complex one, wrapping the ES protocol with customised authentication method, more classes would be missing from the shaded jar.
If shading is consider a better practice, then I would suggest shade nothing or the whole jar.

Cheers!
Jaguar

--

--

At the risk of adding nothing to this thread (!), I can't see how
Elasticsearch is shading JAR files according to the legalese I have read.
Elasticsearch doesn't just repackage the contents of JAR files.

It's more like taking chunks of source code, throwing out what isn't
needed, changing what remains for a better integration in some places, and
putting it into a completely different classpath. It's moret like taking a
snippet from StackOverflow, except that the snippets are much larger!

As for Netty, the real Netty (whose jar I have included in my own servers
that work with ES) is found at org.jboss.netty., whereas all ES classes
are found at org.elasticsearch.
. The fact that some class in ES has
"Netty" in its name is interesting but does not cause my build process any
confusion or dependency issues at all.

The same is true of Jackson. Except that in this case, I am perfectly
content with the stream parser subset in ES and so I use the parser subset
embedded inside ES. But the fact that it happens to work mostly like the
"real" Jackson is only interesting (and very useful) but would not
interfere with any dependency on an external Jackson JAR file.

And the bottom line is: Elasticsearch has been 100.0% rock solid for me (it
really has made me look very good!), is tiny enough to run on my MacBook
and yet capable enough to host the myriad different indices that I use for
development, testing, and ES exploration, from exploring the flexibility of
analyzers and mappings, to exploring the capability of 100M+ records (on
that same laptop).

So whatever the ES developers have done, it works beautifully and reliably
(infinitely better than the Oracle JDBC conundrum on OS X). It doesn't
interfere with my "real" Netty JAR dependency, and (quite surprisingly)
provides me with all of the classes I've needed so far... except for the
sole exception of "real" Netty!

On Wednesday, January 23, 2013 3:29:46 AM UTC-5, Jörg Prante wrote:

Sorry if my rant was too harsh, that's really not my intention.

I'm aware that chasing all the subtle bugs and wrong decisions in the
upstream development of the dependencies is very hard work and is most
admirable. To see your suggestions being integrated into upstream is very
delighting and I hope your improvements will continue being picked up.

What makes me wonder is there are other dependencies like Lucene or the
LGPL-licensed JTS for the geo search which are also "hacked"/"enriched" for
use by Elasticsearch, but are provided as unmodified jars.

My point is, there are other methods to override deficient/unwanted code
in dependencies. For example, adding patch jars in front of the classpath,
that can be removed when upstream bugs are fixed. The original dependencies
could be used throughout the build and plugin development, but are not
effective at runtime. For example, OSGi uses "fragments" for this
classpath-based approach (I read about it, I'm not an OSGi expert). Another
example, WAR artifacts which depend on other WAR artifacts often use
"overlays", but not shading.

For creating Elasticsearch RPMs, I mean the Fedora Packaging Guidelines.

Fedora Packaging Guidelines for Java - Fedora Project Wiki

In short, Fedora policy insists on no bundling, no shading, clean
licensing. Quoting: "Many Java projects re-ship their dependencies in their
own releases. This is unacceptable in Fedora. All packages MUST be built
from source and MUST enumerate their dependencies with Requires."

There has been an effort started to package Elasticsearch for Fedora
recently

https://bugzilla.redhat.com/show_bug.cgi?id=902086

Fedora provides Guava, Netty, Jackson, Joda, Sigar RPMs already, and also
Lucene RPMs. I would really appreciate it if I could help to create
Elasticsearch RPMs which can be accepted by Fedora QA.

Best regards,

Jörg

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.