Convert JavaPairRDD to JavaRDD


I am fetching data from ElsticSearch using ElasticSearch-Hadoop Library.

JavaPairRDD<String, Map<String, Object>> esRDD = JavaEsSpark.esRDD(sc);

Now I have JavaPairRDD. I want to use Random Forest from MLLib on this RDD. So I am converting it to JavaPairRDD.toRDD(esRDD) this will give me RDD. Using RDD I am converting agian to JavaRDD

JavaRDD<LabeledPoint>[] splits = (JavaRDD.fromRDD(JavaPairRDD.toRDD(esRDD),
            esRDD.classTag())).randomSplit(new double[] { 0.5, 0.5 });

JavaRDD<LabeledPoint> trainingData = splits[0];
JavaRDD<LabeledPoint> testData = splits[1];

I want to pass trainingData and TestData to Random Forest algorithm but it gives casting exception at compile time.

Type mismatch: cannot convert from JavaRDD[Tuple2[String,Map[String,Object]]][] to JavaRDD[LabeledPoint][]

Added square brackets as less than and greater than signs are not working
Could any one suggest me the proper way for Casting. I am new to Spark Datastrucutres.

(Costin Leau) #2

The connector returns documents - if you need to convert them to a certain structure (like in your case LabeledPoint) then simply apply a map operation with your custom function; this is exactly the type of case for which Spark was designed and where RDDs and lazy collections / functional programming excels.


Thanks costin. It worked!

(system) #4