I've done a Cassandra gateway implementation, and have some questions

I've done a Cassandra gateway implementation as a plugin at
https://github.com/gistinc/elasticsearch/tree/cassandra. It's built with
ElasticSearch 0.13.0. I'm using Cassandra 0.6.8.

I'm wondering how Cassandra's eventual consistency model interacts with how
ElasticSearch uses its gateway. I admit I have no idea how ElasticSearch
uses its gateway, especially from multiple nodes in the cluster. It seems
safest to use Cassandra's QUORUM consistency level (which requires at least
three Cassandra nodes to tolerate one failing), but it's not clear whether
multiple ElasticSearch nodes might be writing the same blob concurrently,
and what they expect to happen if they do. Can somebody enlighten me about
that?

The Cassandra gateway is not much beyond a proof-of-concept but it does work
and we're currently testing with it at Gist http://www.gist.com/. Its
biggest limitation is that it only talks to a single Cassandra server. It's
using a bare thrift interface; I'll be looking into using an existing
higher-level Java interface that supports failover and such.

I'm not sure how well it will handle very large blobs. Handling large blobs
wasn't a goal. We intend to use this with per-user indexes that won't get
very large. We're also using it with the code on the gistlru branch of that
same git repo, written by my co-worker Matt, which allows us to use
index.store.type = memory and keep only indexes for active users in memory,
the rest being persisted to the gateway. We hope.

Tom

On Thu, Dec 2, 2010 at 10:14 PM, Tom May tom@tommay.net wrote:

I've done a Cassandra gateway implementation as a plugin at
GitHub - gistinc/elasticsearch at cassandra. It's built with
Elasticsearch 0.13.0. I'm using Cassandra 0.6.8.

Great job!, once its settled and done, we can maybe think about getting it
into elasticsearch itself, though my Cassandra knowledge is quite lacking :slight_smile:

I'm wondering how Cassandra's eventual consistency model interacts with how
Elasticsearch uses its gateway. I admit I have no idea how Elasticsearch
uses its gateway, especially from multiple nodes in the cluster. It seems
safest to use Cassandra's QUORUM consistency level (which requires at least
three Cassandra nodes to tolerate one failing), but it's not clear whether
multiple Elasticsearch nodes might be writing the same blob concurrently,
and what they expect to happen if they do. Can somebody enlighten me about
that?

Only the primary shard in each shard replication group writes to the blob
store. So there is only a single writer per shard path. I think QUORUM is a
good value, as if it will fail, it will simply retry next time. Reading from
the blob store only happens when a primary shard is allocated and it needs
to recover its data.

The Cassandra gateway is not much beyond a proof-of-concept but it does
work and we're currently testing with it at Gist http://www.gist.com/.
Its biggest limitation is that it only talks to a single Cassandra server.
It's using a bare thrift interface; I'll be looking into using an existing
higher-level Java interface that supports failover and such.

Sounds great!.

I'm not sure how well it will handle very large blobs. Handling large
blobs wasn't a goal. We intend to use this with per-user indexes that won't
get very large. We're also using it with the code on the gistlru branch of
that same git repo, written by my co-worker Matt, which allows us to use
index.store.type = memory and keep only indexes for active users in memory,
the rest being persisted to the gateway. We hope.

There is already automatic support for that. If you
set gateway.blobstore.chunk_size then files will be automatically broken
down into several blobs. This is how it is done with the s3 gateway, and it
actually automatically set the default chunk size to be 100mb.

Tom

On Thu, Dec 2, 2010 at 4:51 PM, Shay Banon shay.banon@elasticsearch.comwrote:

On Thu, Dec 2, 2010 at 10:14 PM, Tom May tom@tommay.net wrote:

I've done a Cassandra gateway implementation as a plugin at
GitHub - gistinc/elasticsearch at cassandra. It's built with
Elasticsearch 0.13.0. I'm using Cassandra 0.6.8.

Great job!, once its settled and done, we can maybe think about getting it
into elasticsearch itself, though my Cassandra knowledge is quite lacking :slight_smile:

Thanks. I'm all for it being included in Elasticsearch at some point.

And I've just realized I could factor things more cleanly so I'm working on
that :slight_smile:

I'm wondering how Cassandra's eventual consistency model interacts with
how Elasticsearch uses its gateway. I admit I have no idea how
Elasticsearch uses its gateway, especially from multiple nodes in the
cluster. It seems safest to use Cassandra's QUORUM consistency level (which
requires at least three Cassandra nodes to tolerate one failing), but it's
not clear whether multiple Elasticsearch nodes might be writing the same
blob concurrently, and what they expect to happen if they do. Can somebody
enlighten me about that?

Only the primary shard in each shard replication group writes to the blob
store. So there is only a single writer per shard path. I think QUORUM is a
good value, as if it will fail, it will simply retry next time. Reading from
the blob store only happens when a primary shard is allocated and it needs
to recover its data.

Good news, thanks.

The Cassandra gateway is not much beyond a proof-of-concept but it does
work and we're currently testing with it at Gist http://www.gist.com/.
Its biggest limitation is that it only talks to a single Cassandra server.
It's using a bare thrift interface; I'll be looking into using an existing
higher-level Java interface that supports failover and such.

Sounds great!.

I'm not sure how well it will handle very large blobs. Handling large
blobs wasn't a goal. We intend to use this with per-user indexes that won't
get very large. We're also using it with the code on the gistlru branch of
that same git repo, written by my co-worker Matt, which allows us to use
index.store.type = memory and keep only indexes for active users in memory,
the rest being persisted to the gateway. We hope.

There is already automatic support for that. If you
set gateway.blobstore.chunk_size then files will be automatically broken
down into several blobs. This is how it is done with the s3 gateway, and it
actually automatically set the default chunk size to be 100mb.

Ok, I saw that, but I pretty well ignored it for this go, since I could.

Tom

Tom

I think that in any case, it would make sense to initialize the chunk_size to a specific value when working with Cassandra, something like 10-50mb. This will increase the snapshotting process as those are done in parallel. Also, it will make cassandra gateway good for any type of usage "out of the box".
On Friday, December 3, 2010 at 3:05 AM, Tom May wrote:

On Thu, Dec 2, 2010 at 4:51 PM, Shay Banon shay.banon@elasticsearch.com wrote:

On Thu, Dec 2, 2010 at 10:14 PM, Tom May tom@tommay.net wrote:

I've done a Cassandra gateway implementation as a plugin at GitHub - gistinc/elasticsearch at cassandra. It's built with Elasticsearch 0.13.0. I'm using Cassandra 0.6.8.

Great job!, once its settled and done, we can maybe think about getting it into elasticsearch itself, though my Cassandra knowledge is quite lacking :slight_smile:

Thanks. I'm all for it being included in Elasticsearch at some point.

And I've just realized I could factor things more cleanly so I'm working on that :slight_smile:

I'm wondering how Cassandra's eventual consistency model interacts with how Elasticsearch uses its gateway. I admit I have no idea how Elasticsearch uses its gateway, especially from multiple nodes in the cluster. It seems safest to use Cassandra's QUORUM consistency level (which requires at least three Cassandra nodes to tolerate one failing), but it's not clear whether multiple Elasticsearch nodes might be writing the same blob concurrently, and what they expect to happen if they do. Can somebody enlighten me about that?

Only the primary shard in each shard replication group writes to the blob store. So there is only a single writer per shard path. I think QUORUM is a good value, as if it will fail, it will simply retry next time. Reading from the blob store only happens when a primary shard is allocated and it needs to recover its data.

Good news, thanks.

The Cassandra gateway is not much beyond a proof-of-concept but it does work and we're currently testing with it at Gist. Its biggest limitation is that it only talks to a single Cassandra server. It's using a bare thrift interface; I'll be looking into using an existing higher-level Java interface that supports failover and such.

Sounds great!.

I'm not sure how well it will handle very large blobs. Handling large blobs wasn't a goal. We intend to use this with per-user indexes that won't get very large. We're also using it with the code on the gistlru branch of that same git repo, written by my co-worker Matt, which allows us to use index.store.type = memory and keep only indexes for active users in memory, the rest being persisted to the gateway. We hope.

There is already automatic support for that. If you set gateway.blobstore.chunk_size then files will be automatically broken down into several blobs. This is how it is done with the s3 gateway, and it actually automatically set the default chunk size to be 100mb.

Ok, I saw that, but I pretty well ignored it for this go, since I could.

Tom

Tom

I'll look into it.

Tom

On Fri, Dec 3, 2010 at 10:47 AM, Shay Banon shay.banon@elasticsearch.comwrote:

I think that in any case, it would make sense to initialize the
chunk_size to a specific value when working with Cassandra, something like
10-50mb. This will increase the snapshotting process as those are done in
parallel. Also, it will make cassandra gateway good for any type of usage
"out of the box".

On Friday, December 3, 2010 at 3:05 AM, Tom May wrote:

On Thu, Dec 2, 2010 at 4:51 PM, Shay Banon shay.banon@elasticsearch.comwrote:

On Thu, Dec 2, 2010 at 10:14 PM, Tom May tom@tommay.net wrote:

I've done a Cassandra gateway implementation as a plugin at
GitHub - gistinc/elasticsearch at cassandra. It's built with
Elasticsearch 0.13.0. I'm using Cassandra 0.6.8.

Great job!, once its settled and done, we can maybe think about getting it
into elasticsearch itself, though my Cassandra knowledge is quite lacking :slight_smile:

Thanks. I'm all for it being included in Elasticsearch at some point.

And I've just realized I could factor things more cleanly so I'm working on
that :slight_smile:

I'm wondering how Cassandra's eventual consistency model interacts with how
Elasticsearch uses its gateway. I admit I have no idea how Elasticsearch
uses its gateway, especially from multiple nodes in the cluster. It seems
safest to use Cassandra's QUORUM consistency level (which requires at least
three Cassandra nodes to tolerate one failing), but it's not clear whether
multiple Elasticsearch nodes might be writing the same blob concurrently,
and what they expect to happen if they do. Can somebody enlighten me about
that?

Only the primary shard in each shard replication group writes to the blob
store. So there is only a single writer per shard path. I think QUORUM is a
good value, as if it will fail, it will simply retry next time. Reading from
the blob store only happens when a primary shard is allocated and it needs
to recover its data.

Good news, thanks.

The Cassandra gateway is not much beyond a proof-of-concept but it does
work and we're currently testing with it at Gist http://www.gist.com/.
Its biggest limitation is that it only talks to a single Cassandra server.
It's using a bare thrift interface; I'll be looking into using an existing
higher-level Java interface that supports failover and such.

Sounds great!.

I'm not sure how well it will handle very large blobs. Handling large
blobs wasn't a goal. We intend to use this with per-user indexes that won't
get very large. We're also using it with the code on the gistlru branch of
that same git repo, written by my co-worker Matt, which allows us to use
index.store.type = memory and keep only indexes for active users in memory,
the rest being persisted to the gateway. We hope.

There is already automatic support for that. If you
set gateway.blobstore.chunk_size then files will be automatically broken
down into several blobs. This is how it is done with the s3 gateway, and it
actually automatically set the default chunk size to be 100mb.

Ok, I saw that, but I pretty well ignored it for this go, since I could.

Tom

Tom