Field Cache

Trym · November 4, 2011, 8:43am

Hi

When running queries containing sort I experience
java.lang.OutOfMemoryError: Java heap space.

When looking into the generated heapdump file the sinner seems to be
the ResidentFieldDataCache:
One instance of
"org.elasticsearch.index.cache.field.data.resident.ResidentFieldDataCache"
loaded by "sun.misc.Launcher$AppClassLoader @ 0xf42c0a90" occupies
60.526.328 (38,54%) bytes. The memory is accumulated in one instance of
"java.util.concurrent.ConcurrentHashMap$Segment[]" loaded by "".
(Don't mind the sizes, I have deliberately minimized the jvm heap to be
able to easy reproduce the problem).

Is the Field Cache shared by all clients?
How can I choose which cache to use
(ResidentFieldDataCache, SoftFieldDataCache, WeakFieldDataCache, MyOwn?)?
3.a How can I decide the value of index.cache.field.max_size (If I only
have one client, clearing the cache before each search, fetching maximum N
number of documents, sorted by a String field, with M shards containing the
indexes to be searched)?
3.b What happens if the cache cannot contain the fields needed for sorting
a single search request?
Is it only by measuring I can find the relevant -Xmx value for the JVM
handling the client query or can this be calculated?

Thanks for any input.

Best regards Trym

Trym · November 7, 2011, 10:47am

Regarding 2) Use the property "index.cache.field.type" found in the
class org.elasticsearch.index.cache.field.data.FieldDataCacheModule

Best regards Trym

kimchy · November 9, 2011, 6:47am

This cache is not something that can be evicted easily. The "element" in
the cache are all the values for a specific field loaded into memory, which
you need each time you do a sort on a specific field. So, using a different
caching strategy will not help, you just need enough memory to be able to
have all the values for that field you sort on to be loaded to memory.

On Fri, Nov 4, 2011 at 10:43 AM, Trym trym@sigmat.dk wrote:

Hi

When running queries containing sort I experience
java.lang.OutOfMemoryError: Java heap space.

When looking into the generated heapdump file the sinner seems to be
the ResidentFieldDataCache:
One instance of
"org.elasticsearch.index.cache.field.data.resident.ResidentFieldDataCache"
loaded by "sun.misc.Launcher$AppClassLoader @ 0xf42c0a90" occupies
60.526.328 (38,54%) bytes. The memory is accumulated in one instance of
"java.util.concurrent.ConcurrentHashMap$Segment" loaded by "".
(Don't mind the sizes, I have deliberately minimized the jvm heap to be
able to easy reproduce the problem).

Is the Field Cache shared by all clients?

How can I choose which cache to use
(ResidentFieldDataCache, SoftFieldDataCache, WeakFieldDataCache, MyOwn?)?
3.a How can I decide the value of index.cache.field.max_size (If I only
have one client, clearing the cache before each search, fetching maximum N
number of documents, sorted by a String field, with M shards containing the
indexes to be searched)?
3.b What happens if the cache cannot contain the fields needed for sorting
a single search request?

Is it only by measuring I can find the relevant -Xmx value for the JVM
handling the client query or can this be calculated?

Thanks for any input.

Best regards Trym

Trym · November 9, 2011, 7:12am

Thanks for your answer.

What happens when using a weak cache and the GC is running while a sort
is taking place (is the search result still sorted correctly)?
Is is correct that Lucene only keeps a reference (using a priority
queue) to the top results of a sorting search with a max result attribute?
If a node has more shards involved in a search, is it only (as in
Lucene) the needed results that are referenced by the cache or is it all
hits returned by the shards?
Another way of stating this could by: Does ES merge shard results by
loading all in memory and sort these or just keep a "priority queue" of the
top shard results?

Best regards Trym

kimchy · November 9, 2011, 11:23am

As I said before, all the values for a field need to be loaded to memory
when sorting it.

On Wed, Nov 9, 2011 at 9:12 AM, Trym trym@sigmat.dk wrote:

Thanks for your answer.

What happens when using a weak cache and the GC is running while a sort
is taking place (is the search result still sorted correctly)?

Is is correct that Lucene only keeps a reference (using a priority
queue) to the top results of a sorting search with a max result attribute?

If a node has more shards involved in a search, is it only (as in
Lucene) the needed results that are referenced by the cache or is it all
hits returned by the shards?
Another way of stating this could by: Does ES merge shard results by
loading all in memory and sort these or just keep a "priority queue" of the
top shard results?

Best regards Trym

Topic		Replies	Views
[0.90.x] Sorting stress test and no response Elasticsearch	2	313	July 6, 2017
Strange behaviour in field cache use? Elasticsearch	5	340	July 6, 2017
What will happen if I set a small indices.fielddata.cache.size? Elasticsearch	1	326	July 6, 2017
OutOfMemoryError from field cache Elasticsearch	3	349	July 6, 2017
Indices.fielddata.cache.size will be allocated within heap or outside heap? Elasticsearch	1	155	August 29, 2023

Field Cache

Related topics