Gateway Snapshot settings & unexpected behaviour (maybe bug?)

Way back I added this feature request
https://github.com/elasticsearch/elasticsearch/issues/906 which got added,
many thanks.

Except it doesn't seem to be working as we understand it. We have a 2-node
setup, with the gateway snapshot set to fs:

gateway:
type: fs
fs:
location: /mnt/esgateway/

We then do the following:

  • set the disable_flushing property to true as outlined in Issue #906
  • set the snapshot interval to 0 (index.gateway.snapshot_interval)

However when muting the index under these settings we see via the
filesystem and via TRACE level debugging that the node responsible for the
gateway snapshot, while reporting that the settings for disable_flush and
snapshot_interval were set correctly, promptly goes and does a snapshot
muting the snapshot directory state as soon as we. This means that the
snapshot directory can't be safely copied if I understand correctly. We
were expecting once the disable_flush and disabling the snapshot and
performing a manual flush really should allow a safe copy with no
filesystem mutations before we reenable them again.

Here's a gist of the logs: https://gist.github.com/1372395

Feels like we're misinterpreting something or a bug.

As a side note, using the _setting API, it a GET only reports a small set
of values initially, and then if we modify the disable_flush and
snapshot_interval via the _settings api only then to the values start
appearing. Is there any reason that all settings values shouldn't be
displayed always?

cheers,

Paul

The disable flush flag was introduced so backing up when using the local
gateway was simpler, its not really needed with shared gateway (like the fs
shared one you use), and disabling the snapshot interval should be enough
for it.

What you say is that when you set the snapshot interval to 0, a snapshot
still happens?

Regarding the settings, the one returned for the get settings API are only
the ones explicitly set, it does not return settings with "default" values.

On Thu, Nov 17, 2011 at 7:03 AM, Paul Smith tallpsmith@gmail.com wrote:

Way back I added this feature request
Simplified Disable Flush operation · Issue #906 · elastic/elasticsearch · GitHub which got
added, many thanks.

Except it doesn't seem to be working as we understand it. We have a
2-node setup, with the gateway snapshot set to fs:

gateway:
type: fs
fs:
location: /mnt/esgateway/

We then do the following:

  • set the disable_flushing property to true as outlined in Issue #906
  • set the snapshot interval to 0 (index.gateway.snapshot_interval)

However when muting the index under these settings we see via the
filesystem and via TRACE level debugging that the node responsible for the
gateway snapshot, while reporting that the settings for disable_flush and
snapshot_interval were set correctly, promptly goes and does a snapshot
muting the snapshot directory state as soon as we. This means that the
snapshot directory can't be safely copied if I understand correctly. We
were expecting once the disable_flush and disabling the snapshot and
performing a manual flush really should allow a safe copy with no
filesystem mutations before we reenable them again.

Here's a gist of the logs: Despite disable_flush and snapshot_interval set to 0, snapshotting to the filesystem is being done. · GitHub

Feels like we're misinterpreting something or a bug.

As a side note, using the _setting API, it a GET only reports a small set
of values initially, and then if we modify the disable_flush and
snapshot_interval via the _settings api only then to the values start
appearing. Is there any reason that all settings values shouldn't be
displayed always?

cheers,

Paul

On 20 November 2011 18:54, Shay Banon kimchy@gmail.com wrote:

The disable flush flag was introduced so backing up when using the local
gateway was simpler, its not really needed with shared gateway (like the fs
shared one you use), and disabling the snapshot interval should be enough
for it.

Ok, thanks, that makes sense.

What you say is that when you set the snapshot interval to 0, a snapshot
still happens?

Yes, when set to 0, periodic/timed snapshots stop happening, but as soon as
we index something the snapshot happens (see the log gist). We tried
experimenting setting the snapshot internal to a large number, and that
does work EXCEPT when resetting interval value one has to wait for the
larger interval value to complete before the new setting takes affect.
This is presumably because the thread sleeps until that larger interval
value and isn't woken up when the configuration changes.

Should I write up a bug report for this snapshot_interva=0 doesn't work?

Regarding the settings, the one returned for the get settings API are only
the ones explicitly set, it does not return settings with "default" values.

Is there any way other than looking at the docs to then interpret what a
particular setting is configured to then?

I posted it on the issue you opened, but can you try and set the
snapshot_interval setting to -1, 0 will not disable it properly.

Regarding the other problems, changing the snapshot interval should affect
it immediately, it will cancel the previous periodic task, and schedule a
new one.

On Mon, Nov 21, 2011 at 12:18 AM, Paul Smith tallpsmith@gmail.com wrote:

On 20 November 2011 18:54, Shay Banon kimchy@gmail.com wrote:

The disable flush flag was introduced so backing up when using the local
gateway was simpler, its not really needed with shared gateway (like the fs
shared one you use), and disabling the snapshot interval should be enough
for it.

Ok, thanks, that makes sense.

What you say is that when you set the snapshot interval to 0, a
snapshot still happens?

Yes, when set to 0, periodic/timed snapshots stop happening, but as soon
as we index something the snapshot happens (see the log gist). We tried
experimenting setting the snapshot internal to a large number, and that
does work EXCEPT when resetting interval value one has to wait for the
larger interval value to complete before the new setting takes affect.
This is presumably because the thread sleeps until that larger interval
value and isn't woken up when the configuration changes.

Should I write up a bug report for this snapshot_interva=0 doesn't work?

Regarding the settings, the one returned for the get settings API are
only the ones explicitly set, it does not return settings with "default"
values.

Is there any way other than looking at the docs to then interpret what a
particular setting is configured to then?