Search / aggregate across multiple ES clusters?

hi,

i'm just starting out with elasticsearch. (very cool project, btw!)

i'd like to get some advice on how best to search multiple independent
ES clusters (each with common indices/types) from a central point and
aggregate the results.

here's my scenario: i have several several remote groups of servers
that generate data that i'd like to search using ES. because of
various constraints (firewalls, multiple customers w/independent
servers, etc.), it would be difficult to have ES nodes in each group
connect together to form one large "ES cluster."

however, it would be easy (in theory) to connect from a central server
to the independent groups, execute a search on a dedicated ES
cluster in the group, then aggregate the results. (this would allow us
to execute searches on subsets of clusters, which is
important.) ...but the aggregation part is the unknown for me.

since each cluster would use common naming conventions for indices and
types, perhaps we could aggregate results from separate ES clusters as
ES does behind-the-scenes for results from multiple nodes within a
cluster? (i have yet to dig into that part of the code.)

or, are there any features like this (to work with multiple
independent clusters w/common indices/types) on the horizon?

...any help / pointers / etc. would be greatly appreciated.

thanks, tom

Heya,

There is no support for aggregating results across clusters. You can try
and aggregate it yourself, but that would be hard when ti comes to facets
and the like and will loose functionality. There is no plan to support
this...

-shay.banon

On Mon, Jul 18, 2011 at 10:24 PM, tj tomjohnson3@gmail.com wrote:

hi,

i'm just starting out with elasticsearch. (very cool project, btw!)

i'd like to get some advice on how best to search multiple independent
ES clusters (each with common indices/types) from a central point and
aggregate the results.

here's my scenario: i have several several remote groups of servers
that generate data that i'd like to search using ES. because of
various constraints (firewalls, multiple customers w/independent
servers, etc.), it would be difficult to have ES nodes in each group
connect together to form one large "ES cluster."

however, it would be easy (in theory) to connect from a central server
to the independent groups, execute a search on a dedicated ES
cluster in the group, then aggregate the results. (this would allow us
to execute searches on subsets of clusters, which is
important.) ...but the aggregation part is the unknown for me.

since each cluster would use common naming conventions for indices and
types, perhaps we could aggregate results from separate ES clusters as
ES does behind-the-scenes for results from multiple nodes within a
cluster? (i have yet to dig into that part of the code.)

or, are there any features like this (to work with multiple
independent clusters w/common indices/types) on the horizon?

...any help / pointers / etc. would be greatly appreciated.

thanks, tom