Es-spark do not work with spark


#1

Hi, all

I use the code snippet from https://www.elastic.co/guide/en/elasticsearch/hadoop/2.4/spark.html to write some data to elasticsearch.

the content of pom.xml is as follow:


org.elasticsearch
elasticsearch-spark_2.10
2.4.0


org.apache.spark
spark-network-common_2.10




org.apache.spark
spark-core_2.10
1.6.1
provided


com.google.collections
google-collections
1.0-rc2

<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>1.2.1</version>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                    <configuration>
                        <transformers>
                            <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                <mainClass>com.ghawk.spark.SparkMain</mainClass>
                            </transformer>
                        </transformers>
                    </configuration>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

when i compile the project, the log prints follow warnings:
[WARNING] We have a duplicate org/apache/spark/unused/UnusedStubClass.class in /Users/baidu/.m2/repository/org/apache/spark/spark-catalyst_2.10/1.6.1/spark-catalyst_2.10-1.6.1.jar
[WARNING] We have a duplicate com/esotericsoftware/reflectasm/ConstructorAccess.class in /Users/baidu/.m2/repository/com/esotericsoftware/reflectasm/reflectasm/1.07/reflectasm-1.07-shaded.jar
...............

and when i run this program, it throws follow errors:
$ ./bin/spark-submit ghawk-spark-1.0.0-SNAPSHOT.jar
Exception in thread "main" java.lang.SecurityException: Invalid signature file digest for Manifest main attributes
at sun.security.util.SignatureFileVerifier.processImpl(SignatureFileVerifier.java:284)
at sun.security.util.SignatureFileVerifier.process(SignatureFileVerifier.java:238)
at java.util.jar.JarVerifier.processEntry(JarVerifier.java:316)
at java.util.jar.JarVerifier.update(JarVerifier.java:228)
at java.util.jar.JarFile.initializeVerifier(JarFile.java:383)
at java.util.jar.JarFile.getInputStream(JarFile.java:450)
at sun.misc.URLClassPath$JarLoader$2.getInputStream(URLClassPath.java:940)
at sun.misc.Resource.cachedInputStream(Resource.java:77)
at sun.misc.Resource.getByteBuffer(Resource.java:160)
at java.net.URLClassLoader.defineClass(URLClassLoader.java:454)
at java.net.URLClassLoader.access$100(URLClassLoader.java:73)
at java.net.URLClassLoader$1.run(URLClassLoader.java:368)
at java.net.URLClassLoader$1.run(URLClassLoader.java:362)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:361)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.util.Utils$.classForName(Utils.scala:174)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:689)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

it seems elasticsearch-spark_2.10 and spark-core_2.10 do not work well together.
Is there anybody telling me what reason it is? and how to correct it?


#2

i make 2 changes and remove the errors.
1、change maven-shade-plugin version from 1.2.1 to 2.4.3
maven-shade-plugin 2.4.3 will only package one version if classes exist in more than 2 dependent jars.

2、add follow filters for maven-shade-plugin

<build>
    <plugins>
        <plugin>
            <groupId>org.apache.maven.plugins</groupId>
            <artifactId>maven-shade-plugin</artifactId>
            <version>2.4.3</version>
            <executions>
                <execution>
                    <phase>package</phase>
                    <goals>
                        <goal>shade</goal>
                    </goals>
                    <configuration>
                        <filters>
                            <filter>
                                <artifact>*:*</artifact>
                                <excludes>
                                    <exclude>META-INF/*.SF</exclude>
                                    <exclude>META-INF/*.DSA</exclude>
                                    <exclude>META-INF/*.RSA</exclude>
                                </excludes>
                            </filter>
                        </filters>
                        <transformers>
                            <transformer implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer">
                                <mainClass>com.ghawk.spark.SparkMain</mainClass>
                            </transformer>
                        </transformers>
                    </configuration>
                </execution>
            </executions>
        </plugin>
    </plugins>
</build>

now it index es correctly!


(system) #3