Significant terms aggregation too slow for me


(Srinivasan Ramaswamy) #1

I am trying to use the significant terms aggregation feature, but its
making the search very slow. Is there any optimization that i can do to
make it faster ? I have an index with 24 shards and 1 replica, where each
shard size is 2.5G. With the significant terms feature turned on many
searches take ~5s (even when the same search is repeated), with this
feature disabled it takes only ~150ms.

I am using it like the following

SearchRequestBuilder srb = ...;
SignificantTermsBuilder tags =
significantTerms("st_name").field("tags").size(11);
srb.addAggregation(tags);

Does any one have any hints at how to optimize this feature ? Is there some
level of caching involved in this feature ? If it does it shouldnt take ~5s
when the same query is executed again and again, isnt it ?

Thanks
Srini

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/500cd549-cb72-4409-a93b-33789fd18fbe%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Mark Harwood-2) #2

Hi Srini,

(and apologies for the delay in replying - only just spotted this message)

There is indeed a level of caching in the design where all of the terms for
a field are loaded into RAM using FieldData. This lets us lookup the terms
in individual docs very quickly.
However, the stats required for looking up how frequently terms occur in
the background (typically your corpus) are hitting the Lucene APIs to read
frequencies from the Lucene index on disk. Generally the cost of doing this
will be a multiple of how many unique terms are in your result set.

We are currently looking at ways of improving this and for now one approach
may be for you to limit the size of the result set being presented to the
sig_terms agg for analysis. Generally speaking the quality of suggestions
can still be good on smaller (but not too small) sets of relevant results
and arguably the quality of suggestions can go down if the agg is analysing
result sets that include a long-tail of garbage.

Hope this makes sense
Mark

On Thursday, May 29, 2014 6:21:01 PM UTC+1, Srinivasan Ramaswamy wrote:

I am trying to use the significant terms aggregation feature, but its
making the search very slow. Is there any optimization that i can do to
make it faster ? I have an index with 24 shards and 1 replica, where each
shard size is 2.5G. With the significant terms feature turned on many
searches take ~5s (even when the same search is repeated), with this
feature disabled it takes only ~150ms.

I am using it like the following

SearchRequestBuilder srb = ...;
SignificantTermsBuilder tags =
significantTerms("st_name").field("tags").size(11);
srb.addAggregation(tags);

Does any one have any hints at how to optimize this feature ? Is there
some level of caching involved in this feature ? If it does it shouldnt
take ~5s when the same query is executed again and again, isnt it ?

Thanks
Srini

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/62e071c7-93aa-4191-8b43-172d8e68862e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #3