If you are using s3, then you can control the snapshotting interval (defaults to 10 seconds). The setting for that is: index.gateway.snapshot_interval. There is also a snapshot API for that.
Note, a snapshot operation only snapshots delta changes, and not the whole data every time.
Setting it up just requires setting the s3 gateway parameters on all nodes. A cluster uses a single bucket (and inner path) for it. The local data stored on the ec2 instances are stored under the data location.
Finally, consider using local gateway with AWS, and depending on your availability requirements, you can either store them locally, or on EBS.
On Sunday, January 30, 2011 at 5:12 PM, si wrote:
what is the time period to update aws s3 data, and is it a static
period or configurable or based on activity...
my second question is, if I have a replication factor 2 (and no need
to shard, sharding factor 1), how should I set my s3 configuration. do
I need to write both ec2 instance data to s3 in seperate paths, or in
the same path. are they written as seperate files or both use the same
file chunks. my last question is what is the recommended way of
backup. eg: copying s3 chunks daily to a backup folder ?