How much JVM heap are you giving ES and what are the size of the sets?
it looks like in 1.4 you will be able to control the circuit breaker more
via config. However, depending on your data set size I am guessing you are
still going to have to worry what you can allocate to the ES heap since
that page seems to indicate the circuit breakers are defaulted to
reasonably high %.
I am trying to look into the scalability characteristics of this feature
myself because it is iterating for some goals I have, but I don't see any
information about how it scales or what it is bound by. In my case I would
like to be able to analyse foreground sets of 10s to 100s of thousands of
documents against a bg set of millions. Without finding anything
documented your #s might give me an idea if my use is crazy or reasonable
prior getting some testing done with it.
On Friday, September 5, 2014 3:19:13 AM UTC-5, Christoffer Vig wrote:
The significant terms aggregation is a really great feature that allows
for some really interesting data analysis. We quite often experience out of
memory errors, "CircuitBreakingException: Data too large, data would be
larger than limit"
Which is not hard to understand, due to the amount of data and the speed
I think it would be interesting if it was possible to "trade off" speed to
allow deeper analysis. To run significant terms, and possibly other
aggregations, allow them to run for as long as needed, just to return some
(presumably correct) results.
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to email@example.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/00fb6efa-e869-4672-afd6-673c995f1506%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.