Fs gateway snapshots


(Diptamay) #1

The problem: The snapshotting only happens when I shut down the node
that I am running and not every 30 secs, as I would expect from the
below configuration. Did I configure something incorrectly or am I not
understand when the snapshots would take place?

The configuration, that I have setup is:

gateway:
type: fs
fs:
location: /Users/sanyal/Documents/workspace/hb_indices/meta
index:
gateway:
type: fs
fs:
location: /Users/sanyal/Documents/workspace/hb_indices/snapshot
snapshot_interval : 30
snapshot_on_close : true
memory:
enabled: true
store:
type: niofs
number_of_shards : 3
number_of_replicas : 2
path:
logs: /Users/sanyal/Documents/workspace/logs

As a background, I am prototyping ES for use in our in-house CMS
application. So right now, ES is setup on only my laptop, which is
macbook with 2.26 Ghz core 2 duo with 4GB RAM. I am also using only
one node for indexing and searching.


(Shay Banon) #2

Hi, here is the updated configuration that should work:

gateway:
type: fs
fs:
location: /path/to/gateawy/location
index:
gateawy:
snapshot_interval: 30s
number_of_shards: 3
number_of_replicas: 2
path:
logs: /path/to/logs


Some notes regarding the configuration:

  1. You should only define the gateway type to fs on the gateway level, the
    index level will automatically be FS. Also, the path should only be defined
    on the gateway level, the index level will reuse it.

  2. The snapshot interval is defined on the index.gateway level. Note, by
    default, a time_value in elasticsearch is in milliseconds, so you need to
    define 30s. Also, I am surprised that you say you did not see snapshotting
    happen, since the default is 10s. Note, snapshot will only happen if there
    are changes.

  3. I removed the other settings that are the default, like snapshot on
    close. Note, if you do want to set it, its also on the index.gateway level.

  4. You set the number_of_shards and number_of_replicas. This means that
    these are the default values now for any index created, unless explicitly
    specified in the create index API. This applies to all index level settings.

-shay.banon

On Fri, Jun 25, 2010 at 5:38 PM, diptamay diptamay@gmail.com wrote:

The problem: The snapshotting only happens when I shut down the node
that I am running and not every 30 secs, as I would expect from the
below configuration. Did I configure something incorrectly or am I not
understand when the snapshots would take place?

The configuration, that I have setup is:

gateway:
type: fs
fs:
location: /Users/sanyal/Documents/workspace/hb_indices/meta
index:
gateway:
type: fs
fs:
location: /Users/sanyal/Documents/workspace/hb_indices/snapshot
snapshot_interval : 30
snapshot_on_close : true
memory:
enabled: true
store:
type: niofs
number_of_shards : 3
number_of_replicas : 2
path:
logs: /Users/sanyal/Documents/workspace/logs

As a background, I am prototyping ES for use in our in-house CMS
application. So right now, ES is setup on only my laptop, which is
macbook with 2.26 Ghz core 2 duo with 4GB RAM. I am also using only
one node for indexing and searching.


(Diptamay) #3

Hi Shay

Thanks for the configuration. I see that snapshotting, just keeps
updating the segment_N and translog files. Shouldn't the segment.gen
and *.fdx and *.fdx files be backed up as well?

Correct me if I am wrong, so if I stop and start the search server the
whole index gets rebuilt from translog? For e.g:

[15:57:17,257][INFO ][node ] [Metalhead]
{ElasticSearch/0.8.0}[37916]: Initializing ...
[15:57:17,261][INFO ][plugins ] [Metalhead] Loaded []
[15:57:18,344][INFO ][node ] [Metalhead]
{ElasticSearch/0.8.0}[37916]: Initialized
[15:57:18,344][INFO ][node ] [Metalhead]
{ElasticSearch/0.8.0}[37916]: Starting ...
[15:57:18,435][INFO ][transport ] [Metalhead]
bound_address[inet[/0.0.0.0:9300]], publish_address[inet[/
169.254.180.96:9300]]
[15:57:21,571][INFO ][cluster.service ] [Metalhead] New
Master [Metalhead][aa6c1c96-7b8d-4943-927c-33816a48ee9e][inet[/
169.254.180.96:9300]], Reason: zen-disco-initial_connect(master)
[15:57:21,611][INFO ][discovery ] [Metalhead]
elasticsearch/aa6c1c96-7b8d-4943-927c-33816a48ee9e
[15:57:21,635][INFO ][cluster.metadata ] [Metalhead] Creating
Index [hb_14], cause [gateway], shards [3]/[2], mappings [audio,
article, page]
[15:57:21,980][INFO ][http ] [Metalhead]
bound_address[inet[/0.0.0.0:9200]], publish_address[inet[/
169.254.180.96:9200]]
[15:57:22,236][INFO ][jmx ] [Metalhead]
bound_address[service:jmx:rmi:///jndi/rmi://:9400/jmxrmi],
publish_address[service:jmx:rmi:///jndi/rmi://169.254.180.96:9400/
jmxrmi]

If this is so, wouldn't this be a costly thing to do in production
with millions of documents?

Thanks
Diptamay

On Jun 25, 2:36 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi, here is the updated configuration that should work:

gateway:
type: fs
fs:
location: /path/to/gateawy/location
index:
gateawy:
snapshot_interval: 30s
number_of_shards: 3
number_of_replicas: 2
path:
logs: /path/to/logs


Some notes regarding the configuration:

  1. You should only define the gateway type to fs on the gateway level, the
    index level will automatically be FS. Also, the path should only be defined
    on the gateway level, the index level will reuse it.

  2. The snapshot interval is defined on the index.gateway level. Note, by
    default, a time_value in elasticsearch is in milliseconds, so you need to
    define 30s. Also, I am surprised that you say you did not see snapshotting
    happen, since the default is 10s. Note, snapshot will only happen if there
    are changes.

  3. I removed the other settings that are the default, like snapshot on
    close. Note, if you do want to set it, its also on the index.gateway level.

  4. You set the number_of_shards and number_of_replicas. This means that
    these are the default values now for any index created, unless explicitly
    specified in the create index API. This applies to all index level settings.

-shay.banon

On Fri, Jun 25, 2010 at 5:38 PM, diptamay dipta...@gmail.com wrote:

The problem: The snapshotting only happens when I shut down the node
that I am running and not every 30 secs, as I would expect from the
below configuration. Did I configure something incorrectly or am I not
understand when the snapshots would take place?

The configuration, that I have setup is:

gateway:
type: fs
fs:
location: /Users/sanyal/Documents/workspace/hb_indices/meta
index:
gateway:
type: fs
fs:
location: /Users/sanyal/Documents/workspace/hb_indices/snapshot
snapshot_interval : 30
snapshot_on_close : true
memory:
enabled: true
store:
type: niofs
number_of_shards : 3
number_of_replicas : 2
path:
logs: /Users/sanyal/Documents/workspace/logs

As a background, I am prototyping ES for use in our in-house CMS
application. So right now, ES is setup on only my laptop, which is
macbook with 2.26 Ghz core 2 duo with 4GB RAM. I am also using only
one node for indexing and searching.


(Shay Banon) #4

The translog is there so a flush (on elasticsearch terms, which maps to
performing Lucene commit) will not be needed to be performed for each
operation. By default, a flush is executed after 5000 docs have been added
to the translog, in which case a commit is done, and a new translog gets
created. Until then, there are no "new" files in the index, so they don't
get snapshotted to the gateway, only the translog.

So, to your question, at the upmost, only 5000 docs will need to be
reapplied to to a recovered shard from the gateway, not all the changes
done, and this is manageable.

-shay.banon

On Fri, Jun 25, 2010 at 11:07 PM, diptamay diptamay@gmail.com wrote:

Hi Shay

Thanks for the configuration. I see that snapshotting, just keeps
updating the segment_N and translog files. Shouldn't the segment.gen
and *.fdx and *.fdx files be backed up as well?

Correct me if I am wrong, so if I stop and start the search server the
whole index gets rebuilt from translog? For e.g:

[15:57:17,257][INFO ][node ] [Metalhead]
{ElasticSearch/0.8.0}[37916]: Initializing ...
[15:57:17,261][INFO ][plugins ] [Metalhead] Loaded []
[15:57:18,344][INFO ][node ] [Metalhead]
{ElasticSearch/0.8.0}[37916]: Initialized
[15:57:18,344][INFO ][node ] [Metalhead]
{ElasticSearch/0.8.0}[37916]: Starting ...
[15:57:18,435][INFO ][transport ] [Metalhead]
bound_address[inet[/0.0.0.0:9300]], publish_address[inet[/
169.254.180.96:9300]]
[15:57:21,571][INFO ][cluster.service ] [Metalhead] New
Master [Metalhead][aa6c1c96-7b8d-4943-927c-33816a48ee9e][inet[/
169.254.180.96:9300]], Reason: zen-disco-initial_connect(master)
[15:57:21,611][INFO ][discovery ] [Metalhead]
elasticsearch/aa6c1c96-7b8d-4943-927c-33816a48ee9e
[15:57:21,635][INFO ][cluster.metadata ] [Metalhead] Creating
Index [hb_14], cause [gateway], shards [3]/[2], mappings [audio,
article, page]
[15:57:21,980][INFO ][http ] [Metalhead]
bound_address[inet[/0.0.0.0:9200]], publish_address[inet[/
169.254.180.96:9200]]
[15:57:22,236][INFO ][jmx ] [Metalhead]
bound_address[service:jmx:rmi:///jndi/rmi://:9400/jmxrmi],
publish_address[service:jmx:rmi:///jndi/rmi://169.254.180.96:9400/
jmxrmi]

If this is so, wouldn't this be a costly thing to do in production
with millions of documents?

Thanks
Diptamay

On Jun 25, 2:36 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi, here is the updated configuration that should work:

gateway:
type: fs
fs:
location: /path/to/gateawy/location
index:
gateawy:
snapshot_interval: 30s
number_of_shards: 3
number_of_replicas: 2
path:
logs: /path/to/logs


Some notes regarding the configuration:

  1. You should only define the gateway type to fs on the gateway level,
    the
    index level will automatically be FS. Also, the path should only be
    defined
    on the gateway level, the index level will reuse it.

  2. The snapshot interval is defined on the index.gateway level. Note, by
    default, a time_value in elasticsearch is in milliseconds, so you need to
    define 30s. Also, I am surprised that you say you did not see
    snapshotting
    happen, since the default is 10s. Note, snapshot will only happen if
    there
    are changes.

  3. I removed the other settings that are the default, like snapshot on
    close. Note, if you do want to set it, its also on the index.gateway
    level.

  4. You set the number_of_shards and number_of_replicas. This means that
    these are the default values now for any index created, unless explicitly
    specified in the create index API. This applies to all index level
    settings.

-shay.banon

On Fri, Jun 25, 2010 at 5:38 PM, diptamay dipta...@gmail.com wrote:

The problem: The snapshotting only happens when I shut down the node
that I am running and not every 30 secs, as I would expect from the
below configuration. Did I configure something incorrectly or am I not
understand when the snapshots would take place?

The configuration, that I have setup is:

gateway:
type: fs
fs:
location: /Users/sanyal/Documents/workspace/hb_indices/meta
index:
gateway:
type: fs
fs:
location: /Users/sanyal/Documents/workspace/hb_indices/snapshot
snapshot_interval : 30
snapshot_on_close : true
memory:
enabled: true
store:
type: niofs
number_of_shards : 3
number_of_replicas : 2
path:
logs: /Users/sanyal/Documents/workspace/logs

As a background, I am prototyping ES for use in our in-house CMS
application. So right now, ES is setup on only my laptop, which is
macbook with 2.26 Ghz core 2 duo with 4GB RAM. I am also using only
one node for indexing and searching.


(Diptamay) #5

Hi Shay

Thanks for the info. I see the *.cfs files getting snapshotted, now
that I loaded like 100k documents.

-Diptamay


(system) #6