Elasticsearch-spark connector failing to save data with an Illegal Argument Exception : "No class name given"


(eliasah) #1
Exception in thread "main" org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No class name given
	at org.elasticsearch.hadoop.util.Assert.hasText(Assert.java:30)
	at org.elasticsearch.hadoop.util.ObjectUtils.instantiate(ObjectUtils.java:32)
	at org.elasticsearch.hadoop.util.ObjectUtils.instantiate(ObjectUtils.java:52)
	at org.elasticsearch.hadoop.util.ObjectUtils.instantiate(ObjectUtils.java:48)
	at org.elasticsearch.hadoop.serialization.bulk.AbstractBulkFactory.initExtractorsFromSettings(AbstractBulkFactory.java:198)
	at org.elasticsearch.hadoop.serialization.bulk.AbstractBulkFactory.<init>(AbstractBulkFactory.java:174)
	at org.elasticsearch.hadoop.serialization.bulk.IndexBulkFactory.<init>(IndexBulkFactory.java:27)
	at org.elasticsearch.hadoop.serialization.bulk.BulkCommands.create(BulkCommands.java:39)
	at org.elasticsearch.hadoop.rest.RestRepository.lazyInitWriting(RestRepository.java:130)
	at org.elasticsearch.hadoop.rest.RestRepository.writeProcessedToIndex(RestRepository.java:174)
	at org.elasticsearch.hadoop.rest.RestRepository.delete(RestRepository.java:549)
	at org.elasticsearch.spark.sql.ElasticsearchRelation.insert(DefaultSource.scala:481)
	at org.elasticsearch.spark.sql.DefaultSource.createRelation(DefaultSource.scala:76)
	at org.apache.spark.sql.execution.datasources.ResolvedDataSource$.apply(ResolvedDataSource.scala:222)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:148)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:139)
    ...

I'm not sure what does that error message means. I have followed the error thread which laid me to the following line

Assert.hasText(className, "No class name given"); // at org.elasticsearch.hadoop.util.ObjectUtils.instantiate(ObjectUtils.java:32)

Information about the data :

scala> data
// res0: org.apache.spark.sql.DataFrame = [doc_id: bigint, eventDate: timestamp, marketObjectId: bigint, eventType: string, merchantId: bigint, userId: bigint]

scala> data.printSchema
// root
//  |-- doc_id: long (nullable = false)
//  |-- eventDate: timestamp (nullable = true)
//  |-- marketObjectId: long (nullable = true)
//  |-- eventType: string (nullable = true)
//  |-- merchantId: long (nullable = true)
//  |-- userId: long (nullable = true)

code snippet :

val config: scala.collection.mutable.Map[String, String] =
  scala.collection.mutable.Map(
    "pushdown" -> "true",
    "es.nodes" -> "localhost:9200", // params.esHost
    "es.mapping.id" -> "doc_id"
  )

data.write.format("org.elasticsearch.spark.sql")
  .mode(SaveMode.Overwrite)
  .options(config)
  .save("clicks/event") 

I'm using Spark 1.6.2 with elasticsearch-spark_2.10 v. 5.0.0.BUILD-SNAPSHOT in standalone mode.

{
  "name" : "Callisto",
  "cluster_name" : "elasticsearch",
  "version" : {
    "number" : "5.0.0-alpha4",
    "build_hash" : "3f5b994",
    "build_date" : "2016-06-27T16:23:46.861Z",
    "build_snapshot" : false,
    "lucene_version" : "6.1.0"
  },
  "tagline" : "You Know, for Search"
}

The project is assembled with maven (mvn assembly:assembly) to create an uber-jar so all the dependencies are available.

EDIT : Reading data from elasticsearch works perfectly :

sqlContext.read.format("org.elasticsearch.spark.sql")
          .options(config).load("clicks/event")

How can I fix this ?

Any help would be appreciated. Thanks !


(James Baiera) #2

Hi @eliasah,

I took a quick look and was able to reproduce this locally. Could you open an issue on https://github.com/elastic/elasticsearch-hadoop/issues/new for this? This is definitely a bug.


(eliasah) #3

Thanks @james.baiera ! I have submitted an issue https://github.com/elastic/elasticsearch-hadoop/issues/837
with some additional information.

And like I've said in the issue, I have some doubts concerning the SaveMode.Overwrite option that might be causing troubles as it's not recognized by the connector and might be the reason of the EsHadoopIllegalArgumentException: No class name given.


(system) #4