ES backups without using snapshots?

Hi there,

Any suggestions as to how I can create full ES backups without using
snapshot functionality?

The reason I can't use snapshots is because they require a shared directory
mounted on all nodes, but my 3-node cluster spans two data centres and I am
not able to NFS mount over the WAN. I'm also not permitted to backup to
AWS/S3.

As I have 2 replicas of each index, I'm leaning towards the idea of
stopping one node and backing up that node's data directory but wondered if
anyone could suggest a more elegant way. For example, could I snapshot to
a local directory on each node, then manually combine the contents into a
single cohesive backup?

Regards,
Mat

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f0b8a931-c423-4a37-a6df-5181bd4db309%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

How many shards for each index? I am assuming that each node does not have
all the data.

If you can stop indexing, you can just rsync the data to a local directory.
Make sure you execute a flush and preferably an optimize in order to merge
the segments on disk. The trick part is the manual combine you referred to.

BTW, 3 nodes/2 data centers? Sounds like a recipe for trouble. :slight_smile:

Cheers,

Ivan

On Wed, Nov 19, 2014 at 7:41 PM, Mathew D mathew.degerholm@gmail.com
wrote:

Hi there,

Any suggestions as to how I can create full ES backups without using
snapshot functionality?

The reason I can't use snapshots is because they require a shared
directory mounted on all nodes, but my 3-node cluster spans two data
centres and I am not able to NFS mount over the WAN. I'm also not
permitted to backup to AWS/S3.

As I have 2 replicas of each index, I'm leaning towards the idea of
stopping one node and backing up that node's data directory but wondered if
anyone could suggest a more elegant way. For example, could I snapshot to
a local directory on each node, then manually combine the contents into a
single cohesive backup?

Regards,
Mat

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f0b8a931-c423-4a37-a6df-5181bd4db309%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f0b8a931-c423-4a37-a6df-5181bd4db309%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQB0FMyFOg4QvwwymTVUJzAsEvNBnkFA%2BObZbk4e_h_dsg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Try https://github.com/taskrabbit/elasticsearch-dump. You can save your
data (& mappings) to JSON.

On Wednesday, November 19, 2014 5:32:14 PM UTC-8, Ivan Brusic wrote:

How many shards for each index? I am assuming that each node does not have
all the data.

If you can stop indexing, you can just rsync the data to a local
directory. Make sure you execute a flush and preferably an optimize in
order to merge the segments on disk. The trick part is the manual combine
you referred to.

BTW, 3 nodes/2 data centers? Sounds like a recipe for trouble. :slight_smile:

Cheers,

Ivan

On Wed, Nov 19, 2014 at 7:41 PM, Mathew D <mathew.d...@gmail.com
<javascript:>> wrote:

Hi there,

Any suggestions as to how I can create full ES backups without using
snapshot functionality?

The reason I can't use snapshots is because they require a shared
directory mounted on all nodes, but my 3-node cluster spans two data
centres and I am not able to NFS mount over the WAN. I'm also not
permitted to backup to AWS/S3.

As I have 2 replicas of each index, I'm leaning towards the idea of
stopping one node and backing up that node's data directory but wondered if
anyone could suggest a more elegant way. For example, could I snapshot to
a local directory on each node, then manually combine the contents into a
single cohesive backup?

Regards,
Mat

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f0b8a931-c423-4a37-a6df-5181bd4db309%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f0b8a931-c423-4a37-a6df-5181bd4db309%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cc37b018-ceea-4079-b799-ccd8d61b3a70%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hi Ivan,

Thanks for the quick response. We've got 5 shards per index, so with 2
replicas each node should in theory have a full set of data. I was hoping
that taking the node out of service by stopping it would avoid disruption
as a result of pausing indexing, but I couldn't find any documentation to
confirm if such an operation would leave the data files in a consistent
state that could reliably be used for restore.

Evan's suggestion of elasticdump looks like the closest to what I'm after,
although unfortunately I don't have node.js/npm installed (and being an
enterprise could be tricky to get installed).

NB I hear your concerns re cluster design. Incorporating the remote node
was chosen to minimise data loss following a data centre failure, however
because of the risk of split brain, the node actually functions more of a
warm DR than any sort of HA...

Regards,
Mat

On Thursday, November 20, 2014 2:32:14 PM UTC+13, Ivan Brusic wrote:

How many shards for each index? I am assuming that each node does not have
all the data.

If you can stop indexing, you can just rsync the data to a local
directory. Make sure you execute a flush and preferably an optimize in
order to merge the segments on disk. The trick part is the manual combine
you referred to.

BTW, 3 nodes/2 data centers? Sounds like a recipe for trouble. :slight_smile:

Cheers,

Ivan

On Wed, Nov 19, 2014 at 7:41 PM, Mathew D <mathew.d...@gmail.com
<javascript:>> wrote:

Hi there,

Any suggestions as to how I can create full ES backups without using
snapshot functionality?

The reason I can't use snapshots is because they require a shared
directory mounted on all nodes, but my 3-node cluster spans two data
centres and I am not able to NFS mount over the WAN. I'm also not
permitted to backup to AWS/S3.

As I have 2 replicas of each index, I'm leaning towards the idea of
stopping one node and backing up that node's data directory but wondered if
anyone could suggest a more elegant way. For example, could I snapshot to
a local directory on each node, then manually combine the contents into a
single cohesive backup?

Regards,
Mat

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/f0b8a931-c423-4a37-a6df-5181bd4db309%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f0b8a931-c423-4a37-a6df-5181bd4db309%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/7615a20f-7c90-43e4-b22b-052686cf543b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I have never used plugins, but there is also Jorg's tool:

--
Ivan

On Wed, Nov 19, 2014 at 11:27 PM, Mathew D mathew.degerholm@gmail.com
wrote:

Hi Ivan,

Thanks for the quick response. We've got 5 shards per index, so with 2
replicas each node should in theory have a full set of data. I was hoping
that taking the node out of service by stopping it would avoid disruption
as a result of pausing indexing, but I couldn't find any documentation to
confirm if such an operation would leave the data files in a consistent
state that could reliably be used for restore.

Evan's suggestion of elasticdump looks like the closest to what I'm after,
although unfortunately I don't have node.js/npm installed (and being an
enterprise could be tricky to get installed).

NB I hear your concerns re cluster design. Incorporating the remote node
was chosen to minimise data loss following a data centre failure, however
because of the risk of split brain, the node actually functions more of a
warm DR than any sort of HA...

Regards,
Mat

On Thursday, November 20, 2014 2:32:14 PM UTC+13, Ivan Brusic wrote:

How many shards for each index? I am assuming that each node does not
have all the data.

If you can stop indexing, you can just rsync the data to a local
directory. Make sure you execute a flush and preferably an optimize in
order to merge the segments on disk. The trick part is the manual combine
you referred to.

BTW, 3 nodes/2 data centers? Sounds like a recipe for trouble. :slight_smile:

Cheers,

Ivan

On Wed, Nov 19, 2014 at 7:41 PM, Mathew D mathew.d...@gmail.com wrote:

Hi there,

Any suggestions as to how I can create full ES backups without using
snapshot functionality?

The reason I can't use snapshots is because they require a shared
directory mounted on all nodes, but my 3-node cluster spans two data
centres and I am not able to NFS mount over the WAN. I'm also not
permitted to backup to AWS/S3.

As I have 2 replicas of each index, I'm leaning towards the idea of
stopping one node and backing up that node's data directory but wondered if
anyone could suggest a more elegant way. For example, could I snapshot to
a local directory on each node, then manually combine the contents into a
single cohesive backup?

Regards,
Mat

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/f0b8a931-c423-4a37-a6df-5181bd4db309%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/f0b8a931-c423-4a37-a6df-5181bd4db309%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/7615a20f-7c90-43e4-b22b-052686cf543b%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/7615a20f-7c90-43e4-b22b-052686cf543b%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CALY%3DcQAfzN%2BbpvL94TbYMHNr0L4x%2BjEA0D6NrM_Hyj8NjUEHmA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.