We are using 5 clusters and each cluster having 3 nodes.
whether i need to rsync the "data" directory alone . is it enough to
take backup from any one of the node for each cluster. Also can i recover
the cluster using
that data directory when some data crashes occur..?
In the above script they disabling flush to stop indexing data's/records
to that node(Am i right ..?) What happens if any data's indexed at that
time..?
We are using 5 clusters and each cluster having 3 nodes.
whether i need to rsync the "data" directory alone . is it enough to
take backup from any one of the node for each cluster. Also can i recover
the cluster using
that data directory when some data crashes occur..?
Yes, although to be on the safe side, to do that I'd shut down all 3 nodes,
recover all 3 backups and restart the nodes. It's nice to have your data
consistent though the cluster.
In the above script they disabling flush to stop indexing
data's/records to that node(Am i right ..?) What happens if any data's
indexed at that time..?
No, indexing still works. Just flushing those indexed docs to disk is
temporarily suspended. Normally, indexing happens in memory, which is very
fast, then every once in a while those changes are flushed to the actual
Lucene index, on disk, according to your transaction log http://www.elasticsearch.org/guide/reference/index-modules/translog/
settings.
When you disable flushing, this writing to disk doesn't happen until you
enable flushing again. I guess if a lot of indexing happens while you back
up, you can run out of memory. But I never heard anyone complaining about
that. It should depend on how much indexing usually happens, and how much
memory you have.
We are using 5 clusters and each cluster having 3 nodes.
whether i need to rsync the "data" directory alone . is it enough to
take backup from any one of the node for each cluster. Also can i recover
the cluster using
that data directory when some data crashes occur..?
Yes, although to be on the safe side, to do that I'd shut down all 3
nodes, recover all 3 backups and restart the nodes. It's nice to have your
data consistent though the cluster.
In the above script they disabling flush to stop indexing
data's/records to that node(Am i right ..?) What happens if any data's
indexed at that time..?
No, indexing still works. Just flushing those indexed docs to disk is
temporarily suspended. Normally, indexing happens in memory, which is very
fast, then every once in a while those changes are flushed to the actual
Lucene index, on disk, according to your transaction log http://www.elasticsearch.org/guide/reference/index-modules/translog/
settings.
When you disable flushing, this writing to disk doesn't happen until you
enable flushing again. I guess if a lot of indexing happens while you back
up, you can run out of memory. But I never heard anyone complaining about
that. It should depend on how much indexing usually happens, and how much
memory you have.
We are using 5 clusters and each cluster having 3 nodes.
whether i need to rsync the "data" directory alone . is it enough to
take backup from any one of the node for each cluster. Also can i recover
the cluster using
that data directory when some data crashes occur..?
Yes, although to be on the safe side, to do that I'd shut down all 3
nodes, recover all 3 backups and restart the nodes. It's nice to have your
data consistent though the cluster.
In the above script they disabling flush to stop indexing
data's/records to that node(Am i right ..?) What happens if any data's
indexed at that time..?
No, indexing still works. Just flushing those indexed docs to disk is
temporarily suspended. Normally, indexing happens in memory, which is very
fast, then every once in a while those changes are flushed to the actual
Lucene index, on disk, according to your transaction log http://www.elasticsearch.org/guide/reference/index-modules/translog/
settings.
When you disable flushing, this writing to disk doesn't happen until you
enable flushing again. I guess if a lot of indexing happens while you back
up, you can run out of memory. But I never heard anyone complaining about
that. It should depend on how much indexing usually happens, and how much
memory you have.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.