With my current setup of ElasticSearch, I cannot support the memory requirements for the field cache. I was wondering if there has been any thought to turning it off completely.
I have a lot of users perform various facets across quite a bit of data. With our front end UI we restrict the amount of documents they can facet on, and what fields. This lets us keep the memory utilization per facet in check. For example.
Facet #1 - Quick facet
.75 seconds Execution Time
181mb Field Cache size per Node
49,784,204 hits
3.8 bytes per hit in Memory
Facet #2 - Larger Facet
17 seconds Execution Time
4.3gb Field Cache size per Node
502,427,307 hits
Total: 9.5 bytes per hit
Because of this, my cluster is almost guaranteed to go out of memory with cache size for some of the larger facets. I was looking into the documentation, and I can set the max_size to 1, but the code still puts data in the cache, and then evicts it almost immediately.
So i'm wondering if I can just disable the cache entirely. I know the performance is not nearly as fast, but the cluster is stable and has less OutOfMemory issues. We had the exact same problem with Solr, and disabled the field cache as well.
I found the setting, but have come to terms that I cannot disable the field cache. The query time becomes abysmal.
I'm now looking into ways to limit the amount of data in the field cache to protect my cluster from going out of memory. Any advice about what is a good setting for number of documents in the cache?
There is no way to limit the number of documents, as all data is needed when
computing it. I really hope to address this in future versions, and improve
things, but for now, you need to either decide not the facet on that field,
or add more memory / machines.
I found the setting, but have come to terms that I cannot disable the field
cache. The query time becomes abysmal.
I'm now looking into ways to limit the amount of data in the field cache to
protect my cluster from going out of memory. Any advice about what is a
good
setting for number of documents in the cache?
kimchy, I'm confused by your statement about not being able to limit the
number of documents. Does this mean that the following configuration
doesn't actually work as documented?
index.cache.field.max_size: 1000
Logan
On Tuesday, October 25, 2011 9:17:20 PM UTC-6, kimchy wrote:
There is no way to limit the number of documents, as all data is needed
when computing it. I really hope to address this in future versions, and
improve things, but for now, you need to either decide not the facet on
that field, or add more memory / machines.
On Wed, Oct 26, 2011 at 1:31 AM, phobos182 <phob...@gmail.com<javascript:>
wrote:
I found the setting, but have come to terms that I cannot disable the
field
cache. The query time becomes abysmal.
I'm now looking into ways to limit the amount of data in the field cache
to
protect my cluster from going out of memory. Any advice about what is a
good
setting for number of documents in the cache?
I just realized that I replied to a very old post that may no longer be
applicable to current ES builds. I'll create a post on this topic.
Logan
On Monday, November 5, 2012 3:31:37 PM UTC-7, Logan wrote:
kimchy, I'm confused by your statement about not being able to limit the
number of documents. Does this mean that the following configuration
doesn't actually work as documented?
index.cache.field.max_size: 1000
Logan
On Tuesday, October 25, 2011 9:17:20 PM UTC-6, kimchy wrote:
There is no way to limit the number of documents, as all data is needed
when computing it. I really hope to address this in future versions, and
improve things, but for now, you need to either decide not the facet on
that field, or add more memory / machines.
I found the setting, but have come to terms that I cannot disable the
field
cache. The query time becomes abysmal.
I'm now looking into ways to limit the amount of data in the field cache
to
protect my cluster from going out of memory. Any advice about what is a
good
setting for number of documents in the cache?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.