Aggregations in ES

Barak_Yaish · January 31, 2011, 6:38am

Hello,

I was wondering - is there a way to fetch aggregated data from ES so
ES can be utilized for reports? For example, there is a huge index
composed of two fields, string and some counter (the string field is
not unique and appear more than once). I would like to sort that index
and determine which strings has the highest counters (aggregated). One
way would be fetching the entire index (using pagination) and
aggregate the data on the client side, but it won't work for huge
index (memory problems). Another way can be optimizing the index on
build, so before inserting new data, check if data already exist in
the index and update its counter, but index creation will be very
slow. Is there some other way I missed?

Thanks.

Lukas_Vlcek1 · January 31, 2011, 8:08am

Hi,

I think the easiest way for you is to use terms facet
http://www.elasticsearch.com/docs/elasticsearch/rest_api/search/facets/terms_facet/

Regards,
Lukas

On Mon, Jan 31, 2011 at 7:38 AM, barak barak.yaish@gmail.com wrote:

Hello,

I was wondering - is there a way to fetch aggregated data from ES so
ES can be utilized for reports? For example, there is a huge index
composed of two fields, string and some counter (the string field is
not unique and appear more than once). I would like to sort that index
and determine which strings has the highest counters (aggregated). One
way would be fetching the entire index (using pagination) and
aggregate the data on the client side, but it won't work for huge
index (memory problems). Another way can be optimizing the index on
build, so before inserting new data, check if data already exist in
the index and update its counter, but index creation will be very
slow. Is there some other way I missed?

Thanks.

Barak_Yaish · January 31, 2011, 9:19am

Thanks, few follow-up questions:

seems like the Facet interface does not expose the term() and
count(), I browsed the data by casting to InternalIntTermsFacet, is it
design decision?
I did get the counters ordered
( addFacet( FacetBuilders.termsFacet("facet1").field("counter").size(1500000).order( ComparatorType.REVERSE_TERM ) ) ),
is there a way to fetch from the facet the associated data of the
counter, or it requires a second fetch from the index?

Thanks.

On Jan 31, 10:08 am, Lukáš Vlček lukas.vl...@gmail.com wrote:

Hi,

I think the easiest way for you is to use terms facethttp://www.elasticsearch.com/docs/elasticsearch/rest_api/search/facet...

Regards,
Lukas

On Mon, Jan 31, 2011 at 7:38 AM, barak barak.ya...@gmail.com wrote:

Hello,

I was wondering - is there a way to fetch aggregated data from ES so
ES can be utilized for reports? For example, there is a huge index
composed of two fields, string and some counter (the string field is
not unique and appear more than once). I would like to sort that index
and determine which strings has the highest counters (aggregated). One
way would be fetching the entire index (using pagination) and
aggregate the data on the client side, but it won't work for huge
index (memory problems). Another way can be optimizing the index on
build, so before inserting new data, check if data already exist in
the index and update its counter, but index creation will be very
slow. Is there some other way I missed?

Thanks.

kimchy · January 31, 2011, 8:03pm

On Monday, January 31, 2011 at 11:19 AM, barak wrote:

Thanks, few follow-up questions:

seems like the Facet interface does not expose the term() and
count(), I browsed the data by casting to InternalIntTermsFacet, is it
design decision?

You need to cast it to the relevant interface, for example, TermsFacet.

I did get the counters ordered
( addFacet( FacetBuilders.termsFacet("facet1").field("counter").size(1500000).order( ComparatorType.REVERSE_TERM ) ) ),
is there a way to fetch from the facet the associated data of the
counter, or it requires a second fetch from the index?

I think that what you are after is a feature other people asked for in the mailing list, which is not the count of how many times each term appeared, but more of a combination of the stats facet and terms facet. It is being worked on.

Thanks.

On Jan 31, 10:08 am, LukÃ¡Å¡ VlÄek lukas.vl...@gmail.com wrote:

Hi,

I think the easiest way for you is to use terms facethttp://www.elasticsearch.com/docs/elasticsearch/rest_api/search/facet...

Regards,
Lukas

On Mon, Jan 31, 2011 at 7:38 AM, barak barak.ya...@gmail.com wrote:

Hello,

I was wondering - is there a way to fetch aggregated data from ES so
ES can be utilized for reports? For example, there is a huge index
composed of two fields, string and some counter (the string field is
not unique and appear more than once). I would like to sort that index
and determine which strings has the highest counters (aggregated). One
way would be fetching the entire index (using pagination) and
aggregate the data on the client side, but it won't work for huge
index (memory problems). Another way can be optimizing the index on
build, so before inserting new data, check if data already exist in
the index and update its counter, but index creation will be very
slow. Is there some other way I missed?

Thanks.

Barak_Yaish · February 1, 2011, 9:29am

Can you estimate when this feature will be available?

Thanks.

On Jan 31, 10:03 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

I think that what you are after is a feature other people asked for in the mailing list, which is not the count of how many times each term appeared, but more of a combination of the stats facet and terms facet. It is being worked on.