How es-hadoop get data

(t7s) #1

I know the es-hadoop will find shards to get data, but
when dealing with shards, is es-hadoop get all the data then hadoop/spark process the rest steps to get wanted data or es-hadoop use routing mechanism to query to get wanted data

For example, when I use es-hadoop to find records satisfied gender=male, those records will be returned by shards directly or es-hadoop got all data from target shards then obtain the result by iterate whole data?

(Costin Leau) #2

This is explain in the reference doc, in particular in the architecture chapter.
In short, es-hadoop will get only the data needed from each shard; it would be highly ineffective and frankly pointless to get all the data and filter things in memory (why would it do that when ES can do all this itself)?

(system) #3