Soft field data cache isn't evicting data?

I'm load testing a reporting service that will allow users to run
arbitrarily-generated facets against a large set of data (~100m new records
per day, using daily index rotation). We have a 3-node ES cluster, with
72GB ram each (half allocated to ES, half reserved for the OS and disk
cache). We're running ES 0.18.7.

I'm seeing regular out of memory errors that result in a particular node
locking up and dropping out of the cluster. Flushing the field cache or
fully restarting the node brings everything back into a good state, but I'd
rather avoid getting into this state in the first place. I tried setting
index.cache.field.type: soft in /etc/elasticsearch.yml, but this isn't
having the desired effect - is this not where one sets this? Do I need to
set it per-index using the index settings api?

I've read elsewhere that regularly evicting entries from the cache is
undesirable, but I don't see an alternative here - our data set is always
going to be too large to fit in memory, and I'm willing to accept the
performance trade-off since most of our faceted queries are relatively
unique and I'd rather be able to query slowly than worry about crashing ES
nodes by running out of memory.

I'd rather not write a cron job to flush the caches when memory usage
creeps up. It definitely sounds like there's a better way, though, and I'm
just missing something semi-obvious.

Thanks,
-- Dustin

The field cache will be loaded fully for an execution that requires it, so most times (if not all), evicting it makes little sense. Soft field cache relies on the JVM to clear data when it hits memory pressure, but its not really reliable… .

On Friday, March 9, 2012 at 8:37 PM, Dustin Shields-Cloues wrote:

I'm load testing a reporting service that will allow users to run arbitrarily-generated facets against a large set of data (~100m new records per day, using daily index rotation). We have a 3-node ES cluster, with 72GB ram each (half allocated to ES, half reserved for the OS and disk cache). We're running ES 0.18.7.

I'm seeing regular out of memory errors that result in a particular node locking up and dropping out of the cluster. Flushing the field cache or fully restarting the node brings everything back into a good state, but I'd rather avoid getting into this state in the first place. I tried setting index.cache.field.type: soft in /etc/elasticsearch.yml, but this isn't having the desired effect - is this not where one sets this? Do I need to set it per-index using the index settings api?

I've read elsewhere that regularly evicting entries from the cache is undesirable, but I don't see an alternative here - our data set is always going to be too large to fit in memory, and I'm willing to accept the performance trade-off since most of our faceted queries are relatively unique and I'd rather be able to query slowly than worry about crashing ES nodes by running out of memory.

I'd rather not write a cron job to flush the caches when memory usage creeps up. It definitely sounds like there's a better way, though, and I'm just missing something semi-obvious.

Thanks,
-- Dustin

That makes sense - thanks. I think cache eviction does make sense in this
particular use case: we're storing log data, and our queries tend to be
concerned with the past few days. We do occasionally run faceted queries
across larger date ranges, but in general, I don't think we need to be
concerned with caching fields from days-old indices, since we're so much
less likely to need to index them. Regardless of how much ram we throw at
this, our total dataset is always going to to be larger, so I don't see any
way to cache everything.

It does seem clear, though, that we need to ensure that there's always
sufficient memory to cache commonly-faceted fields for a few days worth of
indexes.