I've done a Cassandra gateway implementation as a plugin at
https://github.com/gistinc/elasticsearch/tree/cassandra. It's built with
ElasticSearch 0.13.0. I'm using Cassandra 0.6.8.
I'm wondering how Cassandra's eventual consistency model interacts with how
ElasticSearch uses its gateway. I admit I have no idea how ElasticSearch
uses its gateway, especially from multiple nodes in the cluster. It seems
safest to use Cassandra's QUORUM consistency level (which requires at least
three Cassandra nodes to tolerate one failing), but it's not clear whether
multiple ElasticSearch nodes might be writing the same blob concurrently,
and what they expect to happen if they do. Can somebody enlighten me about
that?
The Cassandra gateway is not much beyond a proof-of-concept but it does work
and we're currently testing with it at Gist http://www.gist.com/. Its
biggest limitation is that it only talks to a single Cassandra server. It's
using a bare thrift interface; I'll be looking into using an existing
higher-level Java interface that supports failover and such.
I'm not sure how well it will handle very large blobs. Handling large blobs
wasn't a goal. We intend to use this with per-user indexes that won't get
very large. We're also using it with the code on the gistlru branch of that
same git repo, written by my co-worker Matt, which allows us to use
index.store.type = memory and keep only indexes for active users in memory,
the rest being persisted to the gateway. We hope.
Tom