Multiple Elastic Queries per spark job

(Matthew Jones) #1

HI All,

I've got a spark job that starts with a single elastic query. I want to use the result of this in a new query within the same spark job to pull down more data, based on what the first one brings back.

I've attempted to do this using the following code;

  val conf = new Conf()
  conf.set("es.query", query)

  val sc = new SparkContext(conf)


 val newElasticQuery = ...
 val newConf = new Conf()
 newConf.set("es.query", newElasticQuery)

 val newSparkContext = SparkContext.getOrCreate(newConf)

The issue I have is this does not use the new query but the original conf, so the same data is pulled.

How would you go about doing this?


(James Baiera) #2

You can only have one SparkContext active per JVM, so your last call to SparkContext.getOrCreate is getting the previously created spark context. You will need to specify these settings on an RDD by RDD basis. You should be able to pass those settings to the RDD create call via a Map.

(Matthew Jones) #3

thanks for you reply! That makes sense, I misunderstood what getorCreate was actually doing.....

I'm still having trouble, can you give me an example of how you would apply this via a Map?

(James Baiera) #4

The last example in the scala portion of this section in the docs has an example of the syntax:

EsSpark.saveToEs(rdd, "index/type", Map("setting" -> "value"))

(Matthew Jones) #5

Got it working, thank you very much!

(system) #6

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.