I took some heap dumps and it looks like the difference is that the bad
scenario index has many, many docs in it (not all of the type being
searched) and so the ordinals array is huge.
Good scenario: the SingleValueLongFieldData instances have valuesCache
array about the same size as the ordinals array.
Bad scenario: the SingleValueLongFieldData instances have valuesCache array
of size 21 and an ordinals array of size 45651.
I posted a revised gist https://gist.github.com/3018581 that sets up this
scenario and indeed, the field cache usage to do the same operation is
orders of magnitude more memory intensive.
Is there some way to mitigate this? Can the ordinals array be shared across
multiple FieldDatas? What really screws us is that each facet seems to
build its own ordinals array and retain it in memory, so we're paying 1.5Mb
in per 200KB in field data.
On Thursday, 28 June 2012 23:58:11 UTC-4, Colin Dellow wrote:
Executing a statistical facet on a long field on a clean index with 10K
items uses 120KB in the field cache, e.g. this gisthttps://gist.github.com/3015553.
That's 12 bytes per long, which seems great.
I have another index with 3.5K items. Doing a statistical facet on a long
field in it uses 1.5mb in the field cache. That's 400 bytes per long, which
They should both be single-valued fields--I'm not getting why it's going
so crazy with memory usage. One possible difference is that the other
index has had lots of inserts, updates, and deletes, if that might affect
things. I've done an optimize on the index, no change on memory usage. What
factors could influence memory usage? Both indexes are on the same ES node
with stock settings for number of shards.