Configuring periodic snapshotting?


(Alex Lambert) #1

Hi folks,

elasticsearch/modules/elasticsearch/src/main/java/org/elasticsearch/
action/admin/indices/gateway/snapshot/GatewaySnapshotRequest.java
says:

  • Gateway snapshot allows to explicitly perform a snapshot through
    the gateway of one or more indices (backup them).
  • By default, each index gateway periodically snapshot changes,
    though it can be disabled and be controlled completely
  • through this API. Best created using {@link
    org.elasticsearch.client.Requests#gatewaySnapshotRequest(String...)}.

I'd like to learn more about this, I wasn't able to find any code or
configuration knobs for the periodic snapshot feature. I understand
that this may not be documented yet; if it isn't, where should I look
for the code?

Thanks for your help!

Best,
Alex


(Clinton Gormley) #2

Hi Alex

elasticsearch/modules/elasticsearch/src/main/java/org/elasticsearch/
action/admin/indices/gateway/snapshot/GatewaySnapshotRequest.java
says:

  • Gateway snapshot allows to explicitly perform a snapshot through
    the gateway of one or more indices (backup them).
  • By default, each index gateway periodically snapshot changes,
    though it can be disabled and be controlled completely
  • through this API. Best created using {@link
    org.elasticsearch.client.Requests#gatewaySnapshotRequest(String...)}.

I'd like to learn more about this, I wasn't able to find any code or
configuration knobs for the periodic snapshot feature. I understand
that this may not be documented yet; if it isn't, where should I look
for the code?

First of all, why do you not want to use the local (default) gateway? It
performs best.

Second, snapshotting in the shared gateway happens automatically, you
don't need to think about it.

If you want to back up a data dir, you should just be able to copy or
rsync it. To be sure that no segments have merged in the meantime, you
can just run a second rsync on it, which should be much faster.

clint


(Karussell) #3

On Jul 1, 11:04 pm, Clinton Gormley clin...@iannounce.co.uk wrote:

Hi Alex

elasticsearch/modules/elasticsearch/src/main/java/org/elasticsearch/
action/admin/indices/gateway/snapshot/GatewaySnapshotRequest.java
says:

  • Gateway snapshot allows to explicitly perform a snapshot through
    the gateway of one or more indices (backup them).
  • By default, each index gateway periodically snapshot changes,
    though it can be disabled and be controlled completely
  • through this API. Best created using {@link
    org.elasticsearch.client.Requests#gatewaySnapshotRequest(String...)}.

I'd like to learn more about this, I wasn't able to find any code or
configuration knobs for the periodic snapshot feature. I understand
that this may not be documented yet; if it isn't, where should I look
for the code?

First of all, why do you not want to use the local (default) gateway? It
performs best.

Second, snapshotting in the shared gateway happens automatically, you
don't need to think about it.

If you want to back up a data dir, you should just be able to copy or
rsync it. To be sure that no segments have merged in the meantime, you
can just run a second rsync on it, which should be much faster.

There is also an API where you can disable flushes + enable it when
you are done:

index.translog.disable_flush

http://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html


(Shay Banon) #4

Hi,

Let me explain. Using the local gateway (the default one) there is no meaning for snapshotting, since the data is persisted on each node, and restored (on cluster restart) from the different nodes local data location.

Snapshotting is there for the shared gateway option. Its the process of getting the local data and persisting it (delta wise) to the shared storage. It works by default in a scheduled manner, but, the gateway snapshot API is there to explicitly invoke it.

As for wheat Karussell suggested. Its a mean to backup in case you are using the local gateway. Where you can disable translog flushing, make a copy of each node data location, and then resume it.

-shay.banon

On Saturday, July 2, 2011 at 11:27 PM, Karussell wrote:

On Jul 1, 11:04 pm, Clinton Gormley <clin...@iannounce.co.uk (http://iannounce.co.uk)> wrote:

Hi Alex

elasticsearch/modules/elasticsearch/src/main/java/org/elasticsearch/
action/admin/indices/gateway/snapshot/GatewaySnapshotRequest.java
says:

  • Gateway snapshot allows to explicitly perform a snapshot through
    the gateway of one or more indices (backup them).
  • By default, each index gateway periodically snapshot changes,
    though it can be disabled and be controlled completely
  • through this API. Best created using {@link
    org.elasticsearch.client.Requests#gatewaySnapshotRequest(String...)}.

I'd like to learn more about this, I wasn't able to find any code or
configuration knobs for the periodic snapshot feature. I understand
that this may not be documented yet; if it isn't, where should I look
for the code?

First of all, why do you not want to use the local (default) gateway? It
performs best.

Second, snapshotting in the shared gateway happens automatically, you
don't need to think about it.

If you want to back up a data dir, you should just be able to copy or
rsync it. To be sure that no segments have merged in the meantime, you
can just run a second rsync on it, which should be much faster.

There is also an API where you can disable flushes + enable it when
you are done:

index.translog.disable_flush

http://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html


(Alex Lambert) #5

Great, that makes sense. Thank you all for your help -- much appreciated!

Best,
Alex

On Sat, Jul 2, 2011 at 4:32 PM, Shay Banon shay.banon@elasticsearch.com wrote:

Hi,
Let me explain. Using the local gateway (the default one) there is no
meaning for snapshotting, since the data is persisted on each node, and
restored (on cluster restart) from the different nodes local data location.
Snapshotting is there for the shared gateway option. Its the process of
getting the local data and persisting it (delta wise) to the shared storage.
It works by default in a scheduled manner, but, the gateway snapshot API is
there to explicitly invoke it.
As for wheat Karussell suggested. Its a mean to backup in case you are
using the local gateway. Where you can disable translog flushing, make a
copy of each node data location, and then resume it.
-shay.banon

On Saturday, July 2, 2011 at 11:27 PM, Karussell wrote:

On Jul 1, 11:04 pm, Clinton Gormley clin...@iannounce.co.uk wrote:

Hi Alex

elasticsearch/modules/elasticsearch/src/main/java/org/elasticsearch/
action/admin/indices/gateway/snapshot/GatewaySnapshotRequest.java
says:

  • Gateway snapshot allows to explicitly perform a snapshot through
    the gateway of one or more indices (backup them).
  • By default, each index gateway periodically snapshot changes,
    though it can be disabled and be controlled completely
  • through this API. Best created using {@link
    org.elasticsearch.client.Requests#gatewaySnapshotRequest(String...)}.

I'd like to learn more about this, I wasn't able to find any code or
configuration knobs for the periodic snapshot feature. I understand
that this may not be documented yet; if it isn't, where should I look
for the code?

First of all, why do you not want to use the local (default) gateway? It
performs best.

Second, snapshotting in the shared gateway happens automatically, you
don't need to think about it.

If you want to back up a data dir, you should just be able to copy or
rsync it. To be sure that no segments have merged in the meantime, you
can just run a second rsync on it, which should be much faster.

There is also an API where you can disable flushes + enable it when
you are done:

index.translog.disable_flush

http://www.elasticsearch.org/guide/reference/api/admin-indices-update-settings.html


(system) #6