Java.lang.UnsupportedOperationException: Not implemented by the TFS FileSystem implementation when starting spark

sandeep_gunnam · July 17, 2015, 4:19am

Hi

Below are the details of the setup i have
Cluster running Hadoop 2.6, Spark 1.3.1, and Scala 2.10.4. I used the library "elasticsearch-spark_2.10-2.1.0.jar" in my code along with other spark libraries (like spark-core_2.10-1.3.1.jar). When i run my code in the cluster using spark-submit i get the following dump with exception

2015-07-16 10:21:14,653 INFO Log4j appears to be running in a Servlet environment, but there's no log4j-web module available. If you want better web container support, please add the log4j-web JAR to your web archive or server lib directory.
10:21:14.763 [main] INFO com.philips.bda.spark.SparkDriver$ - kafkaTopics value passed is [Ljava.lang.String;@5a7169a1.
10:21:14.765 [main] WARN com.philips.bda.spark.SparkDriver$ - Spark mode value is not passed. Running in spark-standalone mode.
10:21:14.765 [main] INFO com.philips.bda.spark.SparkDriver$ - Starting the Spark Applications!!!
15/07/16 10:21:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.UnsupportedOperationException: Not implemented by the TFS FileSystem implementation
at org.apache.hadoop.fs.FileSystem.getScheme(FileSystem.java:216)
at org.apache.hadoop.fs.FileSystem.loadFileSystems(FileSystem.java:2564)
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2574)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:91)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2630)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2612)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:370)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:169)
at com.philips.hdfs.filesystem.FileSystemFactory$.createFileSystem(FileSystemFactory.scala:27)
at com.philips.hdfs.filesystem.FileSystemFactory$.getFileSystem(FileSystemFactory.scala:20)
at com.philips.bda.spark.SparkDriver$.Setup(SparkDriver.scala:201)
at com.philips.bda.spark.SparkDriver$.main(SparkDriver.scala:79)
at com.philips.bda.spark.SparkDriver.main(SparkDriver.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

I did not understand why spark is looking for Tachyon File System. Any help is very much appreciated.

costin · July 17, 2015, 8:31pm

Based on your stacktrace (please use formatting to make the post readable), the culprit seems to be com.philips.bda.spark.SparkDriver$.Setup(SparkDriver.scala:201).
Likely this code tries to bootstrap Spark which in turn, tries to connect to the Tachyon FS (which probably is not needed but is requested by the Setup method or it is not available at runtime).

Either way, there's no elasticsearch-spark code in the stacktrace - in fact, it looks like in your case, the code fails early before any connection to Elasticsearch is done.

Investigate how Spark is started (Spark docs have plenty of examples) and also your classpath; do you add the jars by hand or rely on Maven or other building tool to create it. Typically the latter pulls in all dependencies while the former requires one to cherry pick libraries especially during version upgrades.
If I'm not mistaken, Spark 1.4 now relies on Tachyon so it might just be that a jar is missing.

sandeep_gunnam · July 22, 2015, 6:49am

Thanks for your reply costin. In first place i posted this question here because i got this problem only when i added this jar to my eclipse. When i remove this jar and run my code then i dont see the TFS error.
Here's how i got around this problem.
I used elasticsearch-spark_2.10-2.1.0.jar only during compile time but excluded it during packing using SBT. When i am doing the spark-submit i gave this jar in the jars path and this error is gone. Not sure exactly why, but am atleast i am not blocked.

costin · July 22, 2015, 8:04am

Likely this is caused since Eclipse pull in all dependencies including the ones marked as provided or optional which is incorrect. Not sure how you create, generate the Eclipse classpath file but do take this into account.

govardhanraoganji · July 31, 2015, 5:50am

Use spark-assembly-1.3.1-hadoop2.6.0.jar ,it helps you to over come the issue.