RandomForest Model load error


My Spark program reads data from ElasticSearch(in JavaPairRDD<String, Map<String, Object>> form). This JavaPairRDD is converted to JavaRDD using following inner class

static class Transformer implements
Function<Tuple2<String, Map<String, Object>>, LabeledPoint> {
public LabeledPoint call(Tuple2<String, Map<String, Object>> arg0)
throws Exception {
HashingTF tf = new HashingTF();
Map<String, Object> map = (Map<String, Object>) arg0._2();

		// get values from Map
		Set<String> keys = map.keySet();
		List<Object> valuesList = new ArrayList<Object>();
		for (Iterator<String> i = keys.iterator(); i.hasNext();) {
			String key = (String) i.next();
			Object value = (Object) map.get(key);
		return new LabeledPoint(1d, tf.transform(valuesList));

Using JavaRDD data RandomForest Model is generated and saved to the specific location on the same machine. Model gets saved successfully.

But while loading the model I am getting following exception,
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:293)
at java.lang.Thread.run(Thread.java:745)
Caused by: org.json4s.package$MappingException: Did not find value which can be converted into java.lang.String
at org.json4s.reflect.package$.fail(package.scala:96)
at org.json4s.Extraction$.convert(Extraction.scala:554)
at org.json4s.Extraction$.extract(Extraction.scala:331)
at org.json4s.Extraction$.extract(Extraction.scala:42)
at org.json4s.ExtractableJsonAstNode.extract(ExtractableJsonAstNode.scala:21)
at org.apache.spark.mllib.tree.model.DecisionTreeModel$.load(DecisionTreeModel.scala:326)
at org.apache.spark.mllib.tree.model.DecisionTreeModel.load(DecisionTreeModel.scala)
at co.nttd.integration.SparkESIntegration.main(SparkESIntegration.java:96)

(Costin Leau) #2

Unfortunately you are likely to find better answers on the Spark list directly mainly because the code posted as well as the stacktrace is Spark specific (in this case through json4s which is a JSON library for Scala that seems to be invoked from Spark).

If you can debug (or enable logging) to see what's the underlying json - it looks like the returned json doesn't match the data model. Since all libraries are open-source you can easily track down the source and get additional insights into the code execution.

Hope this helps,

(system) #3