Sorry if this has already been posted somewhere, but is there an
(approximate) statement of theoretical memory usage for multi-field
(string) facets anywhere? (I'd be particularly interested in a comparison
vs memory usage for facets on nested children for reasons mentioned below)
I had to turn facets off on most of my platforms a few months ago because
we had insufficient memory (even using nested facets and putting documents
containing the highest X% cardinality in a separate index) - I replaced
them with manual calculations on a subset of the data.
Obviously reverting back to using facets would be fantastic, but since it's
a reasonable amount of effort to jump to 0.90, which I isn't scheduled for
few months yet, it would be really helpful to be able to estimate what the
new memory usage per shard would be (eg given X documents containing an
array of (average) size Yavg (Z unique values across the shard), each
element being average size T bytes.
eg the old version was something like (XYmax + ZT)264B, where obviously
X*Ymax term rapidly became the dominating factor
I started going through the new code in some spare to see if I could do it,
but all those software engineering tricks made it tricky on a phone UI
On Monday, March 4, 2013 8:31:11 AM UTC-5, Clinton Gormley wrote:
On Mon, 2013-03-04 at 05:25 -0800, Sujoy Sett wrote:
Thanks Clint.
We have a combination of 3-4 filters decided upon at run-time to find
the necessary subset of data; I guess it won't be easy for us to
partition the data considering all those filters .......
Had this document subset been a static one, a separate index could
have worked easily.
You may find that only 3% of your docs have got high numbers of values
for a particular field. Those are the ones you want to move to a
separate index.
eg if you have 100 docs with 2 values in a field, and 1 doc with 1000
values, then you get a matrix of 100 * 1000.
clint
-- Sujoy.
On Monday, March 4, 2013 4:13:08 PM UTC+5:30, Clinton Gormley wrote:
On Sun, 2013-03-03 at 22:42 -0800, Sujoy Sett wrote:
> Thanks Clint.
>
>
> So we got the problem. But is there any way-around to
achieve the
> same?
> Would upgrading to 0.20 be helpful in any way for this?
No, although the next version of ES (0.90+) will help this
problem.
For the moment, what about keeping those docs in a separate
index?
clint
>
>
> -- Sujoy.
>
>
> On Friday, March 1, 2013 5:32:55 PM UTC+5:30, Clinton
Gormley wrote:
> Hiya
>
> >
> > Following is the problem case. I have a index with
35000
> docs and I
> > want to facet on a particular high cardinality
field (=~
> 100) on this
> > index.
> > I have an associated facet filter, which should
always
> filter out some
> > 200 documents from this index upon which I want my
facet to
> be run.
> >
> >
> > Applying query/filter in a separate search query
to retrieve
> those 200
> > docs takes around 10 ms.
> > Using facet with facet-filter (with the same
condition) on
> the same is
> > giving either heap-space error or query timeout
after 60
> secs.
> > Initially I thought high cardinality is the
causing the
> problem, but
> > when I separated out those 200 docs in a separate
index and
> executed
> > facet on that particular field, facet results were
within 5
> ms.
> >
> >
> > My assumption is that facet-filter first filters
out the
> matching
> > documents, and field values for those docs only
are loaded
> in memory
> > for faceting.
>
> That assumption isn't correct. The field values are
loaded
> for all docs
> in the index. And, if the field has multiple
values, then (in
> ES <
> 0.90) it creates a matrix of number_of_docs *
> max_number_of_values
>
> I'm guessing that you have a large number of values
per field,
> hence the
> memory usage. It also explains why, when you index
those docs
> into a
> separate index, your heap usage doesn't explode.
>
> clint
>
>
>
>
> --
> You received this message because you are subscribed to the
Google
> Groups "elasticsearch" group.
> To unsubscribe from this group and stop receiving emails
from it, send
> an email to elasticsearc...@googlegroups.com.
> For more options, visit
https://groups.google.com/groups/opt_out.
>
>
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.