Sub queries with Spark hadoop


#1

Hey, first time here, so apologies if it is a duplicate.

I use ES-Hadoop to run over all index data, but I need to perform sub-queries inside my mapPartition.
I query for entire index, then run over it with mapPartition, but I need to perform another query for each item from other index.

For example, I have an index of dogs and another index of dogs-relations (their band). The first index of dogs is the index that I read with ES-Hadoop as RDD and I need to query the second query for each dog separately.

How can I solve this without perform a query per dog?
I want to narrow queries number as possible, but I don't know how to merge those RDDs in a smart way.

BTW - I opened a StackOverflow question

Any Idea will help,
Thanks.


(James Baiera) #2

@shaimr you could always try a join operation in Spark using RDD's sourced from both indices.


#3

The first index contains 50M documents and the relation index contains 300M documents,
Isn't it too much to perform memory join?


(system) #4

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.