Spark SQL and complex types

I am using Spark 1.3.1 to access some twitter data stored in ElasticSearch. I am using Spark SQL. The problem I am having has to do with Complex Types. For example, the following SQL works but I can't seem to retrieve the content, "SELECT twitterstatus.user.name FROM twitterstatus".

Both approaches below work if complex types are NOT referenced.
DataFrame dataFrame = sqlContext.sql("SELECT twitterstatus.user.name FROM twitterstatus");
List aIP = dataFrame.collectAsList();

///////////////////////////
DataFrame dataFrame = sqlContext.sql("SELECT twitterstatus.user.name FROM twitterstatus");
List aIP = dataFrame.toJavaRDD().map(new Function<Row, Object>() {
@Override
public Object call(Row row) {
return row;
}
}).collect();

But when complex types are referenced, I see the following exception "java.lang.ClassCastException: scala.collection.mutable.LinkedHashMap cannot be cast to org.apache.spark.sql.Row".

Do you have a suggestion for retrieving these complex types from ElasticSearch?

Arthur1

1 Like

I have noticed this same thing. Is this the recommended way to access nested Elastisearch data via spark or am I on the wrong track. Would someone mind giving me some assistance?

Maybe little bit late. But try use esJsonRDD with more code way can handle this problem. I guess somehow the Map[String,AnyRef] cause it use LinkedHashMap.