Understanding the effects of "low memory" (not OOM) on nodes (or: should I just add a new node to my cluster and get on with my life?!)

Summary: After loading facets, CPU runs high at idle even with a fair
bit of memory left - is this expected? And can someone sanity check my
field cache sizes?

My pseudo-operational cluster consists of 3 nodes, each with 10GB of
memory allocated (and mlockall set - in any case I have no swap space)

When my entire document set is loaded (2.5M top-level documents,
various embedded and nested sub-objects), bigdesk shows the store
sizes as: 6.1GB, 4.3GB and 3.7GB (these sizes vary from re-index to re-
index as you'd expect, but that sort of distribution is typical). Each
shard has 1 replica (60 shards in total).

The fun starts when I start to run facets.

First off, I perform a term facet over all geo elements stored in the
documents (the "total" field = 2163442). In terms of field cache, the
3 nodes then have the following usage: 2.8GB, 2.5GB, 1GB

First question: does that seem about right for the field cache taken
up by 2M 14-character strings? That comes out as ~3KB per geo field
instance, much more per unique token value.

Next I perform a term facet on a different string field (average
length ~64B), part of a "nested" object ("total"=1809227). This
increases the field cache usage to: 6.3GB, 3.3GB and 2.8GB.

Looking at "top" on the three nodes, they have the following memory
usage (Virt/Res in GB): 10.1/8.1, 10.1/6.2, 10.1/5.8.

The node with 10.1/8.1 now uses 40% CPU at idle. Nothing is logged
(debug level), and none of the timed events in nodes stats show
anything unusual. Queries still return normally, though performance is
noticeably degraded (the average search time jumps by 1s or so).

I tried running "index.cache.field.type" as both "resident" (ie
default) and "soft". The behavior seems the same in both cases.

For what it's worth, the "offending" node contains all 10 shards from
the largest index, though only 3 of them are "primary".

Second question: Is this expected behavior?

Bonus questions: If so, does it just indicate it's time to add a new
node or do I have any other options left? Is there any other way of
detecting this state other than monitoring CPU trends (ie to send an
alert email out)?

Many thanks for any insight anyone can provide!

Hey,

Trying to analyze high CPU is a tricky business… . I don't see a reason why when no requests are issued you will see this behavior (specifically with the (default) resident type). If you do simulate it again, can you use jstack (jstack - Stack Trace) and gist the stack trace that you get? It might help narrowing down why it happens.

On Wednesday, February 1, 2012 at 5:20 PM, Alex at Ikanow wrote:

Summary: After loading facets, CPU runs high at idle even with a fair
bit of memory left - is this expected? And can someone sanity check my
field cache sizes?

My pseudo-operational cluster consists of 3 nodes, each with 10GB of
memory allocated (and mlockall set - in any case I have no swap space)

When my entire document set is loaded (2.5M top-level documents,
various embedded and nested sub-objects), bigdesk shows the store
sizes as: 6.1GB, 4.3GB and 3.7GB (these sizes vary from re-index to re-
index as you'd expect, but that sort of distribution is typical). Each
shard has 1 replica (60 shards in total).

The fun starts when I start to run facets.

First off, I perform a term facet over all geo elements stored in the
documents (the "total" field = 2163442). In terms of field cache, the
3 nodes then have the following usage: 2.8GB, 2.5GB, 1GB

First question: does that seem about right for the field cache taken
up by 2M 14-character strings? That comes out as ~3KB per geo field
instance, much more per unique token value.

Next I perform a term facet on a different string field (average
length ~64B), part of a "nested" object ("total"=1809227). This
increases the field cache usage to: 6.3GB, 3.3GB and 2.8GB.

Looking at "top" on the three nodes, they have the following memory
usage (Virt/Res in GB): 10.1/8.1, 10.1/6.2, 10.1/5.8.

The node with 10.1/8.1 now uses 40% CPU at idle. Nothing is logged
(debug level), and none of the timed events in nodes stats show
anything unusual. Queries still return normally, though performance is
noticeably degraded (the average search time jumps by 1s or so).

I tried running "index.cache.field.type" as both "resident" (ie
default) and "soft". The behavior seems the same in both cases.

For what it's worth, the "offending" node contains all 10 shards from
the largest index, though only 3 of them are "primary".

Second question: Is this expected behavior?

Bonus questions: If so, does it just indicate it's time to add a new
node or do I have any other options left? Is there any other way of
detecting this state other than monitoring CPU trends (ie to send an
alert email out)?

Many thanks for any insight anyone can provide!

Thanks - it actually stopped after a few days in "soft mode", the RES
memory on all three nodes is >85% of the VIRT but the field cache
sizes have reduced to 3.4GB, 1.8GB, 0.7GB with lots of cache
evictions, interesting (search cache booted out field cache cos nobody
was doing facets on the big indexes perhaps?) ... when nobody's using
it and I get the chance, I'll restart and repeat my steps to make it
happen again.

By the way - for anyone interested in estimating the field cache
memory usage, I had a look at the code and it appears to be (I assume
per shard - not per index - and per replica, though I didn't check) ~

8B * num_docs * max(distinct_field_values_per_doc)

Shay - can you confirm?! (https://github.com/elasticsearch/
elasticsearch/blob/master/src/main/java/org/elasticsearch/index/field/
data/support/FieldDataLoader.java)

(eg if you had 1 doc with 1 geotag, and 1 doc with 1000 geotags, in an
index with 1 shard and no replicas, the memory usage would be
8*(1+1)*1000=16K)

In my 2 cases above (just looking at my largest index, which has 1
replica (==2 instances), containing 2.3M/2.5M documents):

  • geo (1 doc has 154 geotags) = 8B * 2.3M * 154 * 2 = 5.6GB (I
    actually seemed to use 6.3GB)
  • nested field (1 doc has 335 values) = 8B * 2.3M * 335 * 2 = 12.3GB
    (I actually seemed to use only 6.1GB extra no top of geo, not sure if
    nested facets work differently though)

So it doesn't quite match up ... but assuming memory usage is actually
per shard you should usually do better in practice, depending on the
distribution of array length to documents.

If this is remotely accurate, the moral of the story would be that: if
you expect documents to have mostly a small number of values for a
given field, but with the occasional peak (aka most distributions!),
you could massively save on memory by allocating the peaks into their
own index. (or more easily but suboptimally, just putting them all in
1 shard using the "?routing" parameter?!)

(Edit: I'd forgotten I'd chopped out a load of uninteresting values
from my nested field - in fact the second max number was 116 not 335,
so the estimated memory usages are actually both pretty close to the
actual values, phew!)

Sorry to interrupt in this thread , but i am surprised why you are
not replying to my query. I have asked my query 10 times but still no
answer , but u r busy with other queries , is there any body else who can
answer my query...
please let me know. References (Bounding Box query is very slow)
Regards
Prashant

On Sun, Feb 5, 2012 at 4:10 PM, Shay Banon kimchy@gmail.com wrote:

Hey,

Trying to analyze high CPU is a tricky business… . I don't see a reason
why when no requests are issued you will see this behavior (specifically
with the (default) resident type). If you do simulate it again, can you use
jstack (
jstack - Stack Trace) and
gist the stack trace that you get? It might help narrowing down why it
happens.

On Wednesday, February 1, 2012 at 5:20 PM, Alex at Ikanow wrote:

Summary: After loading facets, CPU runs high at idle even with a fair
bit of memory left - is this expected? And can someone sanity check my
field cache sizes?

My pseudo-operational cluster consists of 3 nodes, each with 10GB of
memory allocated (and mlockall set - in any case I have no swap space)

When my entire document set is loaded (2.5M top-level documents,
various embedded and nested sub-objects), bigdesk shows the store
sizes as: 6.1GB, 4.3GB and 3.7GB (these sizes vary from re-index to re-
index as you'd expect, but that sort of distribution is typical). Each
shard has 1 replica (60 shards in total).

The fun starts when I start to run facets.

First off, I perform a term facet over all geo elements stored in the
documents (the "total" field = 2163442). In terms of field cache, the
3 nodes then have the following usage: 2.8GB, 2.5GB, 1GB

First question: does that seem about right for the field cache taken
up by 2M 14-character strings? That comes out as ~3KB per geo field
instance, much more per unique token value.

Next I perform a term facet on a different string field (average
length ~64B), part of a "nested" object ("total"=1809227). This
increases the field cache usage to: 6.3GB, 3.3GB and 2.8GB.

Looking at "top" on the three nodes, they have the following memory
usage (Virt/Res in GB): 10.1/8.1, 10.1/6.2, 10.1/5.8.

The node with 10.1/8.1 now uses 40% CPU at idle. Nothing is logged
(debug level), and none of the timed events in nodes stats show
anything unusual. Queries still return normally, though performance is
noticeably degraded (the average search time jumps by 1s or so).

I tried running "index.cache.field.type" as both "resident" (ie
default) and "soft". The behavior seems the same in both cases.

For what it's worth, the "offending" node contains all 10 shards from
the largest index, though only 3 of them are "primary".

Second question: Is this expected behavior?

Bonus questions: If so, does it just indicate it's time to add a new
node or do I have any other options left? Is there any other way of
detecting this state other than monitoring CPU trends (ie to send an
alert email out)?

Many thanks for any insight anyone can provide!

This is for "Shay Banon"

On Feb 7, 10:39 am, PS prashant.vi...@gmail.com wrote:

Sorry to interrupt in this thread , but i am surprised why you are
not replying to my query. I have asked my query 10 times but still no
answer , but u r busy with other queries , is there any body else who can
answer my query...
please let me know. References (Bounding Box query is very slow)
Regards
Prashant

On Sun, Feb 5, 2012 at 4:10 PM, Shay Banon kim...@gmail.com wrote:

Hey,

Trying to analyze high CPU is a tricky business… . I don't see a reason
why when no requests are issued you will see this behavior (specifically
with the (default) resident type). If you do simulate it again, can you use
jstack (
jstack - Stack Trace) and
gist the stack trace that you get? It might help narrowing down why it
happens.

On Wednesday, February 1, 2012 at 5:20 PM, Alex at Ikanow wrote:

Summary: After loading facets, CPU runs high at idle even with a fair
bit of memory left - is this expected? And can someone sanity check my
field cache sizes?

My pseudo-operational cluster consists of 3 nodes, each with 10GB of
memory allocated (and mlockall set - in any case I have no swap space)

When my entire document set is loaded (2.5M top-level documents,
various embedded and nested sub-objects), bigdesk shows the store
sizes as: 6.1GB, 4.3GB and 3.7GB (these sizes vary from re-index to re-
index as you'd expect, but that sort of distribution is typical). Each
shard has 1 replica (60 shards in total).

The fun starts when I start to run facets.

First off, I perform a term facet over all geo elements stored in the
documents (the "total" field = 2163442). In terms of field cache, the
3 nodes then have the following usage: 2.8GB, 2.5GB, 1GB

First question: does that seem about right for the field cache taken
up by 2M 14-character strings? That comes out as ~3KB per geo field
instance, much more per unique token value.

Next I perform a term facet on a different string field (average
length ~64B), part of a "nested" object ("total"=1809227). This
increases the field cache usage to: 6.3GB, 3.3GB and 2.8GB.

Looking at "top" on the three nodes, they have the following memory
usage (Virt/Res in GB): 10.1/8.1, 10.1/6.2, 10.1/5.8.

The node with 10.1/8.1 now uses 40% CPU at idle. Nothing is logged
(debug level), and none of the timed events in nodes stats show
anything unusual. Queries still return normally, though performance is
noticeably degraded (the average search time jumps by 1s or so).

I tried running "index.cache.field.type" as both "resident" (ie
default) and "soft". The behavior seems the same in both cases.

For what it's worth, the "offending" node contains all 10 shards from
the largest index, though only 3 of them are "primary".

Second question: Is this expected behavior?

Bonus questions: If so, does it just indicate it's time to add a new
node or do I have any other options left? Is there any other way of
detecting this state other than monitoring CPU trends (ie to send an
alert email out)?

Many thanks for any insight anyone can provide!

Alex, I am not sure I followed your reasoning. The actual computation of the size the field data uses depends on the type of the field data itself (you will see for each type there is a different compute size method).

I did not follow your reasoning on shards and sizing? You mean multi valued fields in the same doc?

On Tuesday, February 7, 2012 at 2:03 AM, Alex at Ikanow wrote:

(Edit: I'd forgotten I'd chopped out a load of uninteresting values
from my nested field - in fact the second max number was 116 not 335,
so the estimated memory usages are actually both pretty close to the
actual values, phew!)

Shay -

I was just looking at multi-valued fields (which will usually dominate
field cache memory compared to single-valued fields, I think - at
least that's the case for me by 1-2 orders of magnitude)

From my quick hunt through the code it looked like the dominating
factor for memory usage for multi-valued field data wasn't the types
themselves but the "ordinals" array of array of ints created in the
FieldDataLoader, which mapped the term indices to document ids (and is
independent to data type).

I assume that in addition to this ordinals member, there is type-
specific storage, but with memory usage that does not explicitly scale
with the number of documents, hence irrelevant for larger indexes
(with multi-valued fields, also I assume for very large indexes with a
smallish number of distinct terms).

(I also assumed/guess but didn't explicitly check that the field cache
simply caches these composite objects consisting of type-specific
storage and the ordinals array using the google cache builder class.)

It seems like my case of having several multi-valued fields, which can
have large-ish arrays per document is unusual?

Incidentally, <1% of my documents have a arrays of >= 25% of the max
array length (per field), so I can easily get at least 4x memory
saving by putting the "big" docs in their own index (or shard?)

On Feb 7, 7:06 am, Shay Banon kim...@gmail.com wrote:

Alex, I am not sure I followed your reasoning. The actual computation of the size the field data uses depends on the type of the field data itself (you will see for each type there is a different compute size method).

I did not follow your reasoning on shards and sizing? You mean multi valued fields in the same doc?

On Tuesday, February 7, 2012 at 2:03 AM, Alex at Ikanow wrote:

(Edit: I'd forgotten I'd chopped out a load of uninteresting values
from my nested field - in fact the second max number was 116 not 335,
so the estimated memory usages are actually both pretty close to the
actual values, phew!)

Yea, sadly with multi valued fields per doc you will get a hit in terms of memory usage because we need to keep track of all the references to the values.

One option that you might want to play with, is to use nested documents for all the different values. It will mean more documents in Lucene (since each nested object is another doc in Lucene), but on the other hand, it means no longer having multiple values per doc. In this case, you will need to start using nested queries / filters and nested execution of facets to get the counts you want. I would give it a go on a simple test env to see if it makes a difference.

On Tuesday, February 7, 2012 at 4:40 PM, Alex at Ikanow wrote:

Shay -

I was just looking at multi-valued fields (which will usually dominate
field cache memory compared to single-valued fields, I think - at
least that's the case for me by 1-2 orders of magnitude)

From my quick hunt through the code it looked like the dominating
factor for memory usage for multi-valued field data wasn't the types
themselves but the "ordinals" array of array of ints created in the
FieldDataLoader, which mapped the term indices to document ids (and is
independent to data type).

I assume that in addition to this ordinals member, there is type-
specific storage, but with memory usage that does not explicitly scale
with the number of documents, hence irrelevant for larger indexes
(with multi-valued fields, also I assume for very large indexes with a
smallish number of distinct terms).

(I also assumed/guess but didn't explicitly check that the field cache
simply caches these composite objects consisting of type-specific
storage and the ordinals array using the google cache builder class.)

It seems like my case of having several multi-valued fields, which can
have large-ish arrays per document is unusual?

Incidentally, <1% of my documents have a arrays of >= 25% of the max
array length (per field), so I can easily get at least 4x memory
saving by putting the "big" docs in their own index (or shard?)

On Feb 7, 7:06 am, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

Alex, I am not sure I followed your reasoning. The actual computation of the size the field data uses depends on the type of the field data itself (you will see for each type there is a different compute size method).

I did not follow your reasoning on shards and sizing? You mean multi valued fields in the same doc?

On Tuesday, February 7, 2012 at 2:03 AM, Alex at Ikanow wrote:

(Edit: I'd forgotten I'd chopped out a load of uninteresting values
from my nested field - in fact the second max number was 116 not 335,
so the estimated memory usages are actually both pretty close to the
actual values, phew!)

Hmm interestingly, the nested field memory usage seems to follow the
same pattern (in the above examples, the second field is nested, and
with 2.3M docs and a max of 113 values, uses 6.1GB vs ~5.3GB
"expected", basically identical to the non-nested geo) ... perhaps
that's just coincidence (annoyingly if so, since there seems to be a
workable solution to reducing memory usage in the non-nested
case!)? ... What would you expect the approximate memory usage to be
for a list of nested objects?

When I get a spare moment I'll experiment with forcing documents with
large arrays into their own shard/index (*) and report back on the
memory savings/performance

(*) Am I right that you cache field data by shard and not by
index?

On Feb 7, 1:47 pm, Shay Banon kim...@gmail.com wrote:

Yea, sadly with multi valued fields per doc you will get a hit in terms of memory usage because we need to keep track of all the references to the values.

One option that you might want to play with, is to use nested documents for all the different values. It will mean more documents in Lucene (since each nested object is another doc in Lucene), but on the other hand, it means no longer having multiple values per doc. In this case, you will need to start using nested queries / filters and nested execution of facets to get the counts you want. I would give it a go on a simple test env to see if it makes a difference.

On Tue, 2012-02-07 at 11:09 +0530, PS wrote:

Sorry to interrupt in this thread , but i am surprised why you are not
replying to my query. I have asked my query 10 times but still no
answer , but u r busy with other queries , is there any body else who
can answer my query...

Interrupting another thread and, in fact, posting the same query several
times is considered to be spam.

Please don't do it.

None of us is being paid to answer your queries.

Perhaps you need to rethink how you are posting your question to make it
more interesting. Or perhaps your question has already been answered,
and you need to reread the answers that have been given. Or perhaps you
need to read through the documentation on the website. Or perhaps you
need to put together a simple test case that demonstrates the problem an
open an issue on github.

Whatever the correct answer is, demanding an answer is sure to guarantee
you a complete lack of interest from the others on the list.

clint

please let me know. References (Bounding Box query is very slow)
Regards
Prashant

On Sun, Feb 5, 2012 at 4:10 PM, Shay Banon kimchy@gmail.com wrote:
Hey,

      Trying to analyze high CPU is a tricky business… . I don't
    see a reason why when no requests are issued you will see this
    behavior (specifically with the (default) resident type). If
    you do simulate it again, can you use jstack
    (http://docs.oracle.com/javase/1.5.0/docs/tooldocs/share/jstack.html) and gist the stack trace that you get? It might help narrowing down why it happens.
    
    On Wednesday, February 1, 2012 at 5:20 PM, Alex at Ikanow
    wrote:
    
    > 
    > 
    > Summary: After loading facets, CPU runs high at idle even
    > with a fair
    > bit of memory left - is this expected? And can someone
    > sanity check my
    > field cache sizes?
    > 
    > 
    > My pseudo-operational cluster consists of 3 nodes, each with
    > 10GB of
    > memory allocated (and mlockall set - in any case I have no
    > swap space)
    > 
    > 
    > When my entire document set is loaded (2.5M top-level
    > documents,
    > various embedded and nested sub-objects), bigdesk shows the
    > store
    > sizes as: 6.1GB, 4.3GB and 3.7GB (these sizes vary from
    > re-index to re-
    > index as you'd expect, but that sort of distribution is
    > typical). Each
    > shard has 1 replica (60 shards in total).
    > 
    > 
    > The fun starts when I start to run facets.
    > 
    > 
    > First off, I perform a term facet over all geo elements
    > stored in the
    > documents (the "total" field = 2163442). In terms of field
    > cache, the
    > 3 nodes then have the following usage: 2.8GB, 2.5GB, 1GB
    > 
    > 
    > First question: does that seem about right for the field
    > cache taken
    > up by 2M 14-character strings? That comes out as ~3KB per
    > geo field
    > instance, much more per unique token value.
    > 
    > 
    > Next I perform a term facet on a different string field
    > (average
    > length ~64B), part of a "nested" object ("total"=1809227).
    > This
    > increases the field cache usage to: 6.3GB, 3.3GB and 2.8GB.
    > 
    > 
    > Looking at "top" on the three nodes, they have the following
    > memory
    > usage (Virt/Res in GB): 10.1/8.1, 10.1/6.2, 10.1/5.8.
    > 
    > 
    > The node with 10.1/8.1 now uses 40% CPU at idle. Nothing is
    > logged
    > (debug level), and none of the timed events in nodes stats
    > show
    > anything unusual. Queries still return normally, though
    > performance is
    > noticeably degraded (the average search time jumps by 1s or
    > so).
    > 
    > 
    > I tried running "index.cache.field.type" as both
    > "resident" (ie
    > default) and "soft". The behavior seems the same in both
    > cases.
    > 
    > 
    > For what it's worth, the "offending" node contains all 10
    > shards from
    > the largest index, though only 3 of them are "primary".
    > 
    > 
    > Second question: Is this expected behavior?
    > 
    > 
    > Bonus questions: If so, does it just indicate it's time to
    > add a new
    > node or do I have any other options left? Is there any other
    > way of
    > detecting this state other than monitoring CPU trends (ie to
    > send an
    > alert email out)?
    > 
    > 
    > Many thanks for any insight anyone can provide!

OK I reran my original tests ... the nested field only requires <1GB
of field cache so you were of course right about that.

The reason I had my incorrect memory usage in the original email is
interesting: if I run a geo facet once (from a clean startup), I get
~6GB usage, but if I run it again (with or without the nested field)
then I get ~12GB usage (so I had incorrectly assumed the extra 6GB was
from the nested field whereas actually it was from the second time
through the geo).

Similarly if I run the nested field facet twice from startup, I get
200-400MB the first time through, then 650MB the second time through.

Subsequent runs don't seem to increase the memory usage.

Any quick explanation for what's going on here? Eg does it not
necessarily load all replicas the first time through? The only issue
with this is that the geo somehow uses 12GB vs my expected 6GB (with 1
replica), not sure where I can have dropped an x2 in my memory
calculation... (unless the ordinals array gets copied somewhere?)

My geo array is a duplication of information already stored in nested
objects (for performance, oops!), though annoyingly it can be used for
sorting documents, so I need to see how (/how well) that works with
nested objects.

If sorting and nested objects don't play nice together, I think the
plan will be to use the geo within the nested objects for faceting,
the existing un-nested geo-array for sorting and then use a "soft"
field cache (since geo-sorting is somewhat infrequently used, at least
in the larger datasets), together with sticking all documents with
many geo elements into their own index (or maybe shard). I don't
suppose there's a hidden way of making certain fields use resident
cache and others use soft cache?

On Feb 7, 2:02 pm, Alex at Ikanow apigg...@ikanow.com wrote:

Hmm interestingly, the nested field memory usage seems to follow the
same pattern (in the above examples, the second field is nested, and
with 2.3M docs and a max of 113 values, uses 6.1GB vs ~5.3GB
"expected", basically identical to the non-nested geo) ... perhaps
that's just coincidence (annoyingly if so, since there seems to be a
workable solution to reducing memory usage in the non-nested
case!)? ... What would you expect the approximate memory usage to be
for a list of nested objects?

When I get a spare moment I'll experiment with forcing documents with
large arrays into their own shard/index (*) and report back on the
memory savings/performance

(*) Am I right that you cache field data by shard and not by
index?

On Feb 7, 1:47 pm, Shay Banon kim...@gmail.com wrote:

Yea, sadly with multi valued fields per doc you will get a hit in terms of memory usage because we need to keep track of all the references to the values.

One option that you might want to play with, is to use nested documents for all the different values. It will mean more documents in Lucene (since each nested object is another doc in Lucene), but on the other hand, it means no longer having multiple values per doc. In this case, you will need to start using nested queries / filters and nested execution of facets to get the counts you want. I would give it a go on a simple test env to see if it makes a difference.

The reason it increases is because the first execution hits one set of shards, and the second hits their counterparts (the replicas). Also, I highly recommend against using the soft field cache in any case. If you the memory is needed, at any time, let it be there.

On Wednesday, February 8, 2012 at 5:11 AM, Alex at Ikanow wrote:

OK I reran my original tests ... the nested field only requires <1GB
of field cache so you were of course right about that.

The reason I had my incorrect memory usage in the original email is
interesting: if I run a geo facet once (from a clean startup), I get
~6GB usage, but if I run it again (with or without the nested field)
then I get ~12GB usage (so I had incorrectly assumed the extra 6GB was
from the nested field whereas actually it was from the second time
through the geo).

Similarly if I run the nested field facet twice from startup, I get
200-400MB the first time through, then 650MB the second time through.

Subsequent runs don't seem to increase the memory usage.

Any quick explanation for what's going on here? Eg does it not
necessarily load all replicas the first time through? The only issue
with this is that the geo somehow uses 12GB vs my expected 6GB (with 1
replica), not sure where I can have dropped an x2 in my memory
calculation... (unless the ordinals array gets copied somewhere?)

My geo array is a duplication of information already stored in nested
objects (for performance, oops!), though annoyingly it can be used for
sorting documents, so I need to see how (/how well) that works with
nested objects.

If sorting and nested objects don't play nice together, I think the
plan will be to use the geo within the nested objects for faceting,
the existing un-nested geo-array for sorting and then use a "soft"
field cache (since geo-sorting is somewhat infrequently used, at least
in the larger datasets), together with sticking all documents with
many geo elements into their own index (or maybe shard). I don't
suppose there's a hidden way of making certain fields use resident
cache and others use soft cache?

On Feb 7, 2:02 pm, Alex at Ikanow <apigg...@ikanow.com (http://ikanow.com)> wrote:

Hmm interestingly, the nested field memory usage seems to follow the
same pattern (in the above examples, the second field is nested, and
with 2.3M docs and a max of 113 values, uses 6.1GB vs ~5.3GB
"expected", basically identical to the non-nested geo) ... perhaps
that's just coincidence (annoyingly if so, since there seems to be a
workable solution to reducing memory usage in the non-nested
case!)? ... What would you expect the approximate memory usage to be
for a list of nested objects?

When I get a spare moment I'll experiment with forcing documents with
large arrays into their own shard/index (*) and report back on the
memory savings/performance

(*) Am I right that you cache field data by shard and not by
index?

On Feb 7, 1:47 pm, Shay Banon <kim...@gmail.com (http://gmail.com)> wrote:

Yea, sadly with multi valued fields per doc you will get a hit in terms of memory usage because we need to keep track of all the references to the values.

One option that you might want to play with, is to use nested documents for all the different values. It will mean more documents in Lucene (since each nested object is another doc in Lucene), but on the other hand, it means no longer having multiple values per doc. In this case, you will need to start using nested queries / filters and nested execution of facets to get the counts you want. I would give it a go on a simple test env to see if it makes a difference.

Shay - that makes sense, thanks.

The final guesstimate came out as 4B * num_docs *
max(distinct_field_values_per_doc) * 4 (<- mystery multiplier!) per
original/replica

From a sw engineering point of view I agree with you re field cache;
but with my system engineer hat on I have a system to which users can
upload data (which in the future they will able to run their own
aggregations over), and which doesn't auto-provision (yet!), therefore
the requirement to degrade gracefully (which the soft cache does in my
experience) is very important. I'm willing to live with sub-optimal
performance in low memory cases (and again fwiw it's worked very well
so far for me)

Finally, for information (if anyone's still reading!): I changed my
application logic to put any documents with >20 geo points in a
separate index (<<1% of all documents), and that reduced my index size
across the 3 nodes from ~14.0GB down to ~2.2GB, so that's a 4-line
change well worth doing.

The field cache is a tricky one, you need it to do computation, but if it grows too large, without having enough memory to accommodate for it, then you get into a problem. The problem is that you can then issue another request that needs to use that field data, and it will need to be loaded again, so a soft cache will be problematic as well, since it can get to a place where it gets loaded and evicted each time.

On Thursday, February 9, 2012 at 6:38 PM, Alex at Ikanow wrote:

Shay - that makes sense, thanks.

The final guesstimate came out as 4B * num_docs *
max(distinct_field_values_per_doc) * 4 (<- mystery multiplier!) per
original/replica

From a sw engineering point of view I agree with you re field cache;
but with my system engineer hat on I have a system to which users can
upload data (which in the future they will able to run their own
aggregations over), and which doesn't auto-provision (yet!), therefore
the requirement to degrade gracefully (which the soft cache does in my
experience) is very important. I'm willing to live with sub-optimal
performance in low memory cases (and again fwiw it's worked very well
so far for me)

Finally, for information (if anyone's still reading!): I changed my
application logic to put any documents with >20 geo points in a
separate index (<<1% of all documents), and that reduced my index size
across the 3 nodes from ~14.0GB down to ~2.2GB, so that's a 4-line
change well worth doing.