Using ElasticSearch as a Object Cache

We use ElasticSearch to supplement our large legacy DB, and the search
features are nice. However we also just need a fast object cache. I'm
curious anyone else use ElasticSearch as a cache, like as a replacement
for memcached or ehcache. Any advice on optimizing an index to act like a
cache, is it possible to configure an expulsion policy? I think a memory
store would be best probably.

Thanks in advance!

--

I use caches to cache ElasticSearch data, so that should tell you what I
think. :slight_smile:

Search engines are databases with one important extra feature: scoring.
Documents are returned sorted according to their score. Take away
scoring/ordering and I do not see a use case for a search engine. Of
course, if you already have ES up and running, it might be easier to use
existing technology and not use another stack, but I would look into
memcached, ehcache or even Redis.

Cheers,

Ivan

On Tue, Dec 4, 2012 at 1:05 PM, Andy apryor48@gmail.com wrote:

We use ElasticSearch to supplement our large legacy DB, and the search
features are nice. However we also just need a fast object cache. I'm
curious anyone else use ElasticSearch as a cache, like as a replacement
for memcached or ehcache. Any advice on optimizing an index to act like a
cache, is it possible to configure an expulsion policy? I think a memory
store would be best probably.

Thanks in advance!

--

--

Elasticsearch is not a cache, but it can be configured for similar fast
response times.

  • much RAM, large heap, large direct memory
  • index type memory, force RAM resident process with mlockall()
  • restrict yourself to relativeley small & simple docs (not MB size)
  • rare inserts/updates (compared to number of requests)

So, with the data in the index not exceeding your resources, almost all
disk accesses and almost all the serialization overhead can be eliminated,
and as a result, you would have something like a "memory object cache".

There are of course challenges unknown to memcached and the like. If you
have a common Elasticsearch workload with ingests and queries on the
Elasticsearch cluster beside such a "cache", the cache construction tends
to get trashed in favor of Lucene doing indexing and search. It is a bit
harder to persuade Lucene not to do what it was invented for, that is,
building an inverted index and doing fast search :slight_smile:

But the future is bright, with the DocValues feature of the Lucene 4 codec
framework - here is a great presentation of Simon Willnauer -

key/value structured data will perform better, together with efficient
value updating. Lucene is no longer forced to reindex the whole document
again. In most cases, you will be able to store/fetch simple values from
Elasticsearch DocValue fields very fast like if it was a memory cache.

Ivan pointed to Redis. Lucene codecs will be very powerful. Some people
were curious, they even have started a Lucene 4 codec with Redis as a
backend

Note, a simple eviction policy is available in Elasticsearch on document
level by the ttl field mechanism. Internally, it works like an automatic
bulk delete, triggered once a minute or
so. http://www.elasticsearch.org/guide/reference/mapping/ttl-field.html

Cheers,

Jörg

--