Sorry if this has already been posted somewhere, but is there an
(approximate) statement of theoretical memory usage for multi-field
(string) facets anywhere? (I'd be particularly interested in a comparison
vs memory usage for facets on nested children for reasons mentioned below)
I had to turn facets off on most of my platforms a few months ago because
we had insufficient memory (even using nested facets and putting documents
containing the highest X% cardinality in a separate index) - I replaced
them with manual calculations on a subset of the data.
Obviously reverting back to using facets would be fantastic, but since it's
a reasonable amount of effort to jump to 0.90, which I isn't scheduled for
few months yet, it would be really helpful to be able to estimate what the
new memory usage per shard would be (eg given X documents containing an
array of (average) size Yavg (Z unique values across the shard), each
element being average size T bytes.
eg the old version was something like (XYmax + ZT)264B, where obviously
X*Ymax term rapidly became the dominating factor
I started going through the new code in some spare to see if I could do it,
but all those software engineering tricks made it tricky on a phone UI 
On Monday, March 4, 2013 8:31:11 AM UTC-5, Clinton Gormley wrote:
On Mon, 2013-03-04 at 05:25 -0800, Sujoy Sett wrote:
Thanks Clint.
We have a combination of 3-4 filters decided upon at run-time to find
the necessary subset of data; I guess it won't be easy for us to
partition the data considering all those filters 
  .......
Had this document subset been a static one, a separate index could
have worked easily.
You may find that only 3% of your docs have got high numbers of values
for a particular field. Those are the ones you want to move to a
separate index.
eg if you have 100 docs with 2 values in a field, and 1 doc with 1000
values, then you get a matrix of 100 * 1000.
clint
-- Sujoy.
On Monday, March 4, 2013 4:13:08 PM UTC+5:30, Clinton Gormley wrote:
On Sun, 2013-03-03 at 22:42 -0800, Sujoy Sett wrote:
> Thanks Clint.
>
>
> So we got the problem. But is there any way-around to
achieve the
> same?
> Would upgrading to 0.20 be helpful in any way for this?
    No, although the next version of ES (0.90+) will help this 
    problem. 
    
    For the moment, what about keeping those docs in a separate 
    index? 
    
    clint 
    
    > 
    > 
    > -- Sujoy. 
    > 
    > 
    > On Friday, March 1, 2013 5:32:55 PM UTC+5:30, Clinton 
    Gormley wrote: 
    >         Hiya 
    >         
    >         > 
    >         > Following is the problem case. I have a index with 
    35000 
    >         docs and I 
    >         > want to facet on a particular high cardinality 
    field (=~ 
    >         100) on this 
    >         > index. 
    >         > I have an associated facet filter, which should 
    always 
    >         filter out some 
    >         > 200 documents from this index upon which I want my 
    facet to 
    >         be run. 
    >         > 
    >         > 
    >         > Applying query/filter in a separate search query 
    to retrieve 
    >         those 200 
    >         > docs takes around 10 ms. 
    >         > Using facet with facet-filter (with the same 
    condition) on 
    >         the same is 
    >         > giving either heap-space error or query timeout 
    after 60 
    >         secs. 
    >         > Initially I thought high cardinality is the 
    causing the 
    >         problem, but 
    >         > when I separated out those 200 docs in a separate 
    index and 
    >         executed 
    >         > facet on that particular field, facet results were 
    within 5 
    >         ms. 
    >         > 
    >         > 
    >         > My assumption is that facet-filter first filters 
    out the 
    >         matching 
    >         > documents, and field values for those docs only 
    are loaded 
    >         in memory 
    >         > for faceting. 
    >         
    >         That assumption isn't correct.  The field values are 
    loaded 
    >         for all docs 
    >         in the index.  And, if the field has multiple 
    values, then (in 
    >         ES < 
    >         0.90) it creates a matrix of number_of_docs * 
    >         max_number_of_values 
    >         
    >         I'm guessing that you have a large number of values 
    per field, 
    >         hence the 
    >         memory usage.  It also explains why, when you index 
    those docs 
    >         into a 
    >         separate index, your heap usage doesn't explode. 
    >         
    >         clint 
    >         
    >         
    >         
    > 
    > -- 
    > You received this message because you are subscribed to the 
    Google 
    > Groups "elasticsearch" group. 
    > To unsubscribe from this group and stop receiving emails 
    from it, send 
    > an email to elasticsearc...@googlegroups.com. 
    > For more options, visit 
    https://groups.google.com/groups/opt_out. 
    >   
    >   
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.