What libraries are needed for spark?

chris_snow · April 22, 2016, 4:56pm

I want to be able to read and write to ES from spark. I've created a elasticsearch-hadoop-with-dependencies library but it's 188Mb!

I've stripped out some stuff that I don't think I need - see my gradle configuration below. What else can be removed?

dependencies {
    // the elastic search library
    compile('org.elasticsearch:elasticsearch-hadoop:2.3.0') {
        // FIXME: there was an issue zipping up this library so excluding it
        exclude group: 'org.apache.curator', module: 'apache-curator'
    }
}


// create jar file with all dependencies
task('SetupElasticSearchLibs', type: Jar) {
    baseName = 'elastic-search-with-dependencies'
    from {
        configurations.compile.filter( {
                                        !(it.name =~ /spark.*\.jar/ ) &&
                                        !(it.name =~ /jetty-all.*\.jar/ ) &&
                                        !(it.name =~ /servlet-api.*\.jar/ ) &&
                                        !(it.name =~ /pig.*\.jar/ )
                                    }).collect {
            println it
            it.isDirectory() ? it : zipTree(it)
        }
    }
    zip64 = true
    with jar
}

costin · April 26, 2016, 1:42am

es-hadoop by itself doesn't require any libraries - it is designed to reuse the ones available in Hadoop and Spark at runtime. Take a look at the docs or better yet, the POM in particular the dependencies scope (which are provided).
In your case, through gradle, you are getting Hadoop and Spark in your uber jar which is really unnecessary.

Topic		Replies	Views
Is there a guide from creating uber jar with spark core and elasticsearch-spark? Elasticsearch es-hadoop	7	2994	July 6, 2017
Elasticsearch spark runtime dependencies Elasticsearch es-hadoop , runtime-fields	2	253	November 1, 2023
Unresolved dependency for elasticsearch-spark Elasticsearch es-hadoop	1	871	December 17, 2016
Problem between Spark and Elasticsearch Elasticsearch es-hadoop	2	2346	July 6, 2017
ES-Hadoop compatibility with Spark 2.4.4 Elasticsearch es-hadoop	1	471	October 19, 2020

What libraries are needed for spark?

Related topics