When running queries containing sort I experience
java.lang.OutOfMemoryError: Java heap space.
When looking into the generated heapdump file the sinner seems to be
the ResidentFieldDataCache:
One instance of
"org.elasticsearch.index.cache.field.data.resident.ResidentFieldDataCache"
loaded by "sun.misc.Launcher$AppClassLoader @ 0xf42c0a90" occupies
60.526.328 (38,54%) bytes. The memory is accumulated in one instance of
"java.util.concurrent.ConcurrentHashMap$Segment[]" loaded by "".
(Don't mind the sizes, I have deliberately minimized the jvm heap to be
able to easy reproduce the problem).
Is the Field Cache shared by all clients?
How can I choose which cache to use
(ResidentFieldDataCache, SoftFieldDataCache, WeakFieldDataCache, MyOwn?)?
3.a How can I decide the value of index.cache.field.max_size (If I only
have one client, clearing the cache before each search, fetching maximum N
number of documents, sorted by a String field, with M shards containing the
indexes to be searched)?
3.b What happens if the cache cannot contain the fields needed for sorting
a single search request?
Is it only by measuring I can find the relevant -Xmx value for the JVM
handling the client query or can this be calculated?
This cache is not something that can be evicted easily. The "element" in
the cache are all the values for a specific field loaded into memory, which
you need each time you do a sort on a specific field. So, using a different
caching strategy will not help, you just need enough memory to be able to
have all the values for that field you sort on to be loaded to memory.
On Fri, Nov 4, 2011 at 10:43 AM, Trym trym@sigmat.dk wrote:
Hi
When running queries containing sort I experience
java.lang.OutOfMemoryError: Java heap space.
When looking into the generated heapdump file the sinner seems to be
the ResidentFieldDataCache:
One instance of
"org.elasticsearch.index.cache.field.data.resident.ResidentFieldDataCache"
loaded by "sun.misc.Launcher$AppClassLoader @ 0xf42c0a90" occupies
60.526.328 (38,54%) bytes. The memory is accumulated in one instance of
"java.util.concurrent.ConcurrentHashMap$Segment" loaded by "".
(Don't mind the sizes, I have deliberately minimized the jvm heap to be
able to easy reproduce the problem).
Is the Field Cache shared by all clients?
How can I choose which cache to use
(ResidentFieldDataCache, SoftFieldDataCache, WeakFieldDataCache, MyOwn?)?
3.a How can I decide the value of index.cache.field.max_size (If I only
have one client, clearing the cache before each search, fetching maximum N
number of documents, sorted by a String field, with M shards containing the
indexes to be searched)?
3.b What happens if the cache cannot contain the fields needed for sorting
a single search request?
Is it only by measuring I can find the relevant -Xmx value for the JVM
handling the client query or can this be calculated?
What happens when using a weak cache and the GC is running while a sort
is taking place (is the search result still sorted correctly)?
Is is correct that Lucene only keeps a reference (using a priority
queue) to the top results of a sorting search with a max result attribute?
If a node has more shards involved in a search, is it only (as in
Lucene) the needed results that are referenced by the cache or is it all
hits returned by the shards?
Another way of stating this could by: Does ES merge shard results by
loading all in memory and sort these or just keep a "priority queue" of the
top shard results?
As I said before, all the values for a field need to be loaded to memory
when sorting it.
On Wed, Nov 9, 2011 at 9:12 AM, Trym trym@sigmat.dk wrote:
Thanks for your answer.
What happens when using a weak cache and the GC is running while a sort
is taking place (is the search result still sorted correctly)?
Is is correct that Lucene only keeps a reference (using a priority
queue) to the top results of a sorting search with a max result attribute?
If a node has more shards involved in a search, is it only (as in
Lucene) the needed results that are referenced by the cache or is it all
hits returned by the shards?
Another way of stating this could by: Does ES merge shard results by
loading all in memory and sort these or just keep a "priority queue" of the
top shard results?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.