what is the time period to update aws s3 data, and is it a static
period or configurable or based on activity...
my second question is, if I have a replication factor 2 (and no need
to shard, sharding factor 1), how should I set my s3 configuration. do
I need to write both ec2 instance data to s3 in seperate paths, or in
the same path. are they written as seperate files or both use the same
file chunks. my last question is what is the recommended way of
backup. eg: copying s3 chunks daily to a backup folder ?
If you are using s3, then you can control the snapshotting interval (defaults to 10 seconds). The setting for that is: index.gateway.snapshot_interval. There is also a snapshot API for that.
Note, a snapshot operation only snapshots delta changes, and not the whole data every time.
Setting it up just requires setting the s3 gateway parameters on all nodes. A cluster uses a single bucket (and inner path) for it. The local data stored on the ec2 instances are stored under the data location.
Finally, consider using local gateway with AWS, and depending on your availability requirements, you can either store them locally, or on EBS.
On Sunday, January 30, 2011 at 5:12 PM, si wrote:
Hi,
what is the time period to update aws s3 data, and is it a static
period or configurable or based on activity...
my second question is, if I have a replication factor 2 (and no need
to shard, sharding factor 1), how should I set my s3 configuration. do
I need to write both ec2 instance data to s3 in seperate paths, or in
the same path. are they written as seperate files or both use the same
file chunks. my last question is what is the recommended way of
backup. eg: copying s3 chunks daily to a backup folder ?
thanks Shay, elasticsearch is very intelligently designed, using ec2
local instance storage and snapshotting with some periods only delta
to s3 is very cost saving and rock solid when used with replicas and
also LB & HA ability, perfect...
If you are using s3, then you can control the snapshotting interval (defaults to 10 seconds). The setting for that is: index.gateway.snapshot_interval. There is also a snapshot API for that.
Note, a snapshot operation only snapshots delta changes, and not the whole data every time.
Setting it up just requires setting the s3 gateway parameters on all nodes. A cluster uses a single bucket (and inner path) for it. The local data stored on the ec2 instances are stored under the data location.
Finally, consider using local gateway with AWS, and depending on your availability requirements, you can either store them locally, or on EBS.
On Sunday, January 30, 2011 at 5:12 PM, si wrote:
Hi,
what is the time period to update aws s3 data, and is it a static
period or configurable or based on activity...
my second question is, if I have a replication factor 2 (and no need
to shard, sharding factor 1), how should I set my s3 configuration. do
I need to write both ec2 instance data to s3 in seperate paths, or in
the same path. are they written as seperate files or both use the same
file chunks. my last question is what is the recommended way of
backup. eg: copying s3 chunks daily to a backup folder ?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.