I was wondering - is there a way to fetch aggregated data from ES so
ES can be utilized for reports? For example, there is a huge index
composed of two fields, string and some counter (the string field is
not unique and appear more than once). I would like to sort that index
and determine which strings has the highest counters (aggregated). One
way would be fetching the entire index (using pagination) and
aggregate the data on the client side, but it won't work for huge
index (memory problems). Another way can be optimizing the index on
build, so before inserting new data, check if data already exist in
the index and update its counter, but index creation will be very
slow. Is there some other way I missed?
I was wondering - is there a way to fetch aggregated data from ES so
ES can be utilized for reports? For example, there is a huge index
composed of two fields, string and some counter (the string field is
not unique and appear more than once). I would like to sort that index
and determine which strings has the highest counters (aggregated). One
way would be fetching the entire index (using pagination) and
aggregate the data on the client side, but it won't work for huge
index (memory problems). Another way can be optimizing the index on
build, so before inserting new data, check if data already exist in
the index and update its counter, but index creation will be very
slow. Is there some other way I missed?
seems like the Facet interface does not expose the term() and
count(), I browsed the data by casting to InternalIntTermsFacet, is it
design decision?
I did get the counters ordered
( addFacet( FacetBuilders.termsFacet("facet1").field("counter").size(1500000).order( ComparatorType.REVERSE_TERM ) ) ),
is there a way to fetch from the facet the associated data of the
counter, or it requires a second fetch from the index?
I was wondering - is there a way to fetch aggregated data from ES so
ES can be utilized for reports? For example, there is a huge index
composed of two fields, string and some counter (the string field is
not unique and appear more than once). I would like to sort that index
and determine which strings has the highest counters (aggregated). One
way would be fetching the entire index (using pagination) and
aggregate the data on the client side, but it won't work for huge
index (memory problems). Another way can be optimizing the index on
build, so before inserting new data, check if data already exist in
the index and update its counter, but index creation will be very
slow. Is there some other way I missed?
On Monday, January 31, 2011 at 11:19 AM, barak wrote:
Thanks, few follow-up questions:
seems like the Facet interface does not expose the term() and
count(), I browsed the data by casting to InternalIntTermsFacet, is it
design decision?
You need to cast it to the relevant interface, for example, TermsFacet.
I did get the counters ordered
( addFacet( FacetBuilders.termsFacet("facet1").field("counter").size(1500000).order( ComparatorType.REVERSE_TERM ) ) ),
is there a way to fetch from the facet the associated data of the
counter, or it requires a second fetch from the index?
I think that what you are after is a feature other people asked for in the mailing list, which is not the count of how many times each term appeared, but more of a combination of the stats facet and terms facet. It is being worked on.
I was wondering - is there a way to fetch aggregated data from ES so
ES can be utilized for reports? For example, there is a huge index
composed of two fields, string and some counter (the string field is
not unique and appear more than once). I would like to sort that index
and determine which strings has the highest counters (aggregated). One
way would be fetching the entire index (using pagination) and
aggregate the data on the client side, but it won't work for huge
index (memory problems). Another way can be optimizing the index on
build, so before inserting new data, check if data already exist in
the index and update its counter, but index creation will be very
slow. Is there some other way I missed?
I think that what you are after is a feature other people asked for in the mailing list, which is not the count of how many times each term appeared, but more of a combination of the stats facet and terms facet. It is being worked on.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.