Field Cache

Hi

When running queries containing sort I experience
java.lang.OutOfMemoryError: Java heap space.

When looking into the generated heapdump file the sinner seems to be
the ResidentFieldDataCache:
One instance of
"org.elasticsearch.index.cache.field.data.resident.ResidentFieldDataCache"
loaded by "sun.misc.Launcher$AppClassLoader @ 0xf42c0a90" occupies
60.526.328 (38,54%) bytes. The memory is accumulated in one instance of
"java.util.concurrent.ConcurrentHashMap$Segment[]" loaded by "".
(Don't mind the sizes, I have deliberately minimized the jvm heap to be
able to easy reproduce the problem).

  1. Is the Field Cache shared by all clients?
  2. How can I choose which cache to use
    (ResidentFieldDataCache, SoftFieldDataCache, WeakFieldDataCache, MyOwn?)?
    3.a How can I decide the value of index.cache.field.max_size (If I only
    have one client, clearing the cache before each search, fetching maximum N
    number of documents, sorted by a String field, with M shards containing the
    indexes to be searched)?
    3.b What happens if the cache cannot contain the fields needed for sorting
    a single search request?
  3. Is it only by measuring I can find the relevant -Xmx value for the JVM
    handling the client query or can this be calculated?

Thanks for any input.

Best regards Trym

Regarding 2) Use the property "index.cache.field.type" found in the
class org.elasticsearch.index.cache.field.data.FieldDataCacheModule

Best regards Trym

This cache is not something that can be evicted easily. The "element" in
the cache are all the values for a specific field loaded into memory, which
you need each time you do a sort on a specific field. So, using a different
caching strategy will not help, you just need enough memory to be able to
have all the values for that field you sort on to be loaded to memory.

On Fri, Nov 4, 2011 at 10:43 AM, Trym trym@sigmat.dk wrote:

Hi

When running queries containing sort I experience
java.lang.OutOfMemoryError: Java heap space.

When looking into the generated heapdump file the sinner seems to be
the ResidentFieldDataCache:
One instance of
"org.elasticsearch.index.cache.field.data.resident.ResidentFieldDataCache"
loaded by "sun.misc.Launcher$AppClassLoader @ 0xf42c0a90" occupies
60.526.328 (38,54%) bytes. The memory is accumulated in one instance of
"java.util.concurrent.ConcurrentHashMap$Segment" loaded by "".
(Don't mind the sizes, I have deliberately minimized the jvm heap to be
able to easy reproduce the problem).

  1. Is the Field Cache shared by all clients?
  2. How can I choose which cache to use
    (ResidentFieldDataCache, SoftFieldDataCache, WeakFieldDataCache, MyOwn?)?
    3.a How can I decide the value of index.cache.field.max_size (If I only
    have one client, clearing the cache before each search, fetching maximum N
    number of documents, sorted by a String field, with M shards containing the
    indexes to be searched)?
    3.b What happens if the cache cannot contain the fields needed for sorting
    a single search request?
  3. Is it only by measuring I can find the relevant -Xmx value for the JVM
    handling the client query or can this be calculated?

Thanks for any input.

Best regards Trym

Thanks for your answer.

  1. What happens when using a weak cache and the GC is running while a sort
    is taking place (is the search result still sorted correctly)?
  2. Is is correct that Lucene only keeps a reference (using a priority
    queue) to the top results of a sorting search with a max result attribute?
  3. If a node has more shards involved in a search, is it only (as in
    Lucene) the needed results that are referenced by the cache or is it all
    hits returned by the shards?
    Another way of stating this could by: Does ES merge shard results by
    loading all in memory and sort these or just keep a "priority queue" of the
    top shard results?

Best regards Trym

As I said before, all the values for a field need to be loaded to memory
when sorting it.

On Wed, Nov 9, 2011 at 9:12 AM, Trym trym@sigmat.dk wrote:

Thanks for your answer.

  1. What happens when using a weak cache and the GC is running while a sort
    is taking place (is the search result still sorted correctly)?
  2. Is is correct that Lucene only keeps a reference (using a priority
    queue) to the top results of a sorting search with a max result attribute?
  3. If a node has more shards involved in a search, is it only (as in
    Lucene) the needed results that are referenced by the cache or is it all
    hits returned by the shards?
    Another way of stating this could by: Does ES merge shard results by
    loading all in memory and sort these or just keep a "priority queue" of the
    top shard results?

Best regards Trym