I've got a spark job that starts with a single elastic query. I want to use the result of this in a new query within the same spark job to pull down more data, based on what the first one brings back.
I've attempted to do this using the following code;
val conf = new Conf()
conf.set("es.query", query)
val sc = new SparkContext(conf)
// SPARK STUFF TO DETERMINE SECOND ELASTIC QUERY
val newElasticQuery = ...
val newConf = new Conf()
newConf.set("es.query", newElasticQuery)
val newSparkContext = SparkContext.getOrCreate(newConf)
The issue I have is this does not use the new query but the original conf, so the same data is pulled.
You can only have one SparkContext active per JVM, so your last call to SparkContext.getOrCreate is getting the previously created spark context. You will need to specify these settings on an RDD by RDD basis. You should be able to pass those settings to the RDD create call via a Map.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.