Size of dependencies

John_Bush_3 · November 7, 2015, 8:39pm

I've been working with the ES Spout for storm it works great, but I'm concerned about the number of dependencies brought in. It makes my rather small storm job around 180MB because I'm bringing the whole entire hadoop ecosystem transitively. My suggestion would be to maybe slice up the elasticsearch-hadoop projects into smaller jars, for someone using this for only storm, we don't need pig, hive, spark, parquet, etc, etc, etc,etc.

I know I can put dependencies on the storm cluster, but that really breaks the deployment strategy and undermines the whole values the JVM's isolated classpath which I need in an organization is big as mine, with jobs being development by many teams.

I'm going to attempt to size down the dependencies myself which I expect to be a giant hairball of classpath issues. If anyone else has some tips in this arena I'd love to hear them.

John_Bush_3 · November 7, 2015, 9:02pm

I'll answer my own question, this got the size down by 1/2 and it still runs ok

<dependency>
        <groupId>org.elasticsearch</groupId>
        <artifactId>elasticsearch-hadoop</artifactId>
        <version>2.1.0</version>
        <scope>runtime</scope>
        <exclusions>
            <exclusion>
                <groupId>org.apache.spark</groupId>
                <artifactId>*</artifactId>
            </exclusion>
            <exclusion>
                <groupId>org.spark-project</groupId>
                <artifactId>*</artifactId>
            </exclusion>
            <exclusion>
                <groupId>org.glassfish</groupId>
                <artifactId>*</artifactId>
            </exclusion>
            <exclusion>
                <groupId>com.twitter</groupId>
                <artifactId>*</artifactId>
            </exclusion>
            <exclusion>
                <groupId>org.apache.zookeeper</groupId>
                <artifactId>*</artifactId>
            </exclusion>
            <exclusion>
                <groupId>org.apache.hive</groupId>
                <artifactId>*</artifactId>
            </exclusion>

            <exclusion>
                <groupId>log4j</groupId>
                <artifactId>*</artifactId>
            </exclusion>
            <exclusion>
                <groupId>org.slf4j</groupId>
                <artifactId>*</artifactId>
            </exclusion>


        </exclusions>
    </dependency>

costin · November 7, 2015, 9:03pm

Have you tried using the minimalistic binaries ?

John_Bush_3 · November 7, 2015, 9:04pm

oh nice! that's exactly what I need. Thanks.