Can't stop a snapshot running on my cluster


(Andrew Vos) #1

A few days ago I started a snapshot, but instead of using a shared network
I used the local filesystem. Because my root partition only had 8gb (and
this is where I stored the snapshots) the partition got filled up and three
of my seven elasticsearch boxes crashed almost instantly.

I've since created a new cluster and let the data replicate over. The
problem now is, I can't seem to cancel this snapshot!

Looking at the snapshot:
curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2:9300]][cluster/snapshot/get]];
nested: SnapshotMissingException[[production_backup:_snapshot1] is
missing]; nested:
FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No such
file or directory)]; ","status":404}% *

Starting a new snapshot:
curl -XPUT
"localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"

*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2:9300]][cluster/snapshot/create]];
nested: ConcurrentSnapshotExecutionException[[production_backup:snapshot_1]
a snapshot is already running]; ","status":503}% *

This just never completes:

curl -XDELETE "localhost:9999/_snapshot/production_backup/snapshot_1"

Any idea how I can solve this?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/af485e13-dc74-4e88-b6db-e3a4d67fb00c%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Igor Motov) #2

Which version of elsticsearch are you using? Can you send me the current
cluster state?

On Saturday, May 24, 2014 10:17:43 AM UTC-4, Andrew Vos wrote:

A few days ago I started a snapshot, but instead of using a shared network
I used the local filesystem. Because my root partition only had 8gb (and
this is where I stored the snapshots) the partition got filled up and three
of my seven elasticsearch boxes crashed almost instantly.

I've since created a new cluster and let the data replicate over. The
problem now is, I can't seem to cancel this snapshot!

Looking at the snapshot:
curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2:9300]][cluster/snapshot/get]];
nested: SnapshotMissingException[[production_backup:_snapshot1] is
missing]; nested:
FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No such
file or directory)]; ","status":404}% *

Starting a new snapshot:
curl -XPUT
"localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"

*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2:9300]][cluster/snapshot/create]];
nested: ConcurrentSnapshotExecutionException[[production_backup:snapshot_1]
a snapshot is already running]; ","status":503}% *

This just never completes:

curl -XDELETE "localhost:9999/_snapshot/production_backup/snapshot_1"

Any idea how I can solve this?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Andrew Vos) #3

1.0.0. What do you mean by state exactly?

On Sat, May 24, 2014 at 6:33 PM, Igor Motov imotov@gmail.com wrote:

Which version of elsticsearch are you using? Can you send me the current
cluster state?

On Saturday, May 24, 2014 10:17:43 AM UTC-4, Andrew Vos wrote:

A few days ago I started a snapshot, but instead of using a shared
network I used the local filesystem. Because my root partition only had 8gb
(and this is where I stored the snapshots) the partition got filled up and
three of my seven elasticsearch boxes crashed almost instantly.

I've since created a new cluster and let the data replicate over. The
problem now is, I can't seem to cancel this snapshot!

Looking at the snapshot:
curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2
http://172.17.0.2:9300]][cluster/snapshot/get]]; nested:
SnapshotMissingException[[production_backup:_snapshot1] is missing];
nested: FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No
such file or directory)]; ","status":404}% *

Starting a new snapshot:
curl -XPUT
"localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"

*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2
http://172.17.0.2:9300]][cluster/snapshot/create]]; nested:
ConcurrentSnapshotExecutionException[[production_backup:snapshot_1] a
snapshot is already running]; ","status":503}% *

This just never completes:

curl -XDELETE "localhost:9999/_snapshot/production_backup/snapshot_1"

Any idea how I can solve this?

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAE64QtHJ8FUQOvAzDN%3DXegbm%3DBQ%2BUix1R5akpwTHMsuciFqdCg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Andrew Vos) #4

Right ok here's the cluster state

On Sat, May 24, 2014 at 7:18 PM, Andrew Vos andrew.vos@gmail.com wrote:

1.0.0. What do you mean by state exactly?

On Sat, May 24, 2014 at 6:33 PM, Igor Motov imotov@gmail.com wrote:

Which version of elsticsearch are you using? Can you send me the current
cluster state?

On Saturday, May 24, 2014 10:17:43 AM UTC-4, Andrew Vos wrote:

A few days ago I started a snapshot, but instead of using a shared
network I used the local filesystem. Because my root partition only had 8gb
(and this is where I stored the snapshots) the partition got filled up and
three of my seven elasticsearch boxes crashed almost instantly.

I've since created a new cluster and let the data replicate over. The
problem now is, I can't seem to cancel this snapshot!

Looking at the snapshot:
curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2
http://172.17.0.2:9300]][cluster/snapshot/get]]; nested:
SnapshotMissingException[[production_backup:_snapshot1] is missing];
nested: FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No
such file or directory)]; ","status":404}% *

Starting a new snapshot:
curl -XPUT
"localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"

*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2
http://172.17.0.2:9300]][cluster/snapshot/create]]; nested:
ConcurrentSnapshotExecutionException[[production_backup:snapshot_1] a
snapshot is already running]; ","status":503}% *

This just never completes:

curl -XDELETE "localhost:9999/_snapshot/production_backup/snapshot_1"

Any idea how I can solve this?

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAE64QtEo%2BPimLXv6whLS63hHzzV2rUnu3bxB2tBQTnNRKA3RLg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Igor Motov) #5

I meant the output of the cluster state
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/cluster-state.html#cluster-state
command:

curl -XGET 'http://localhost:9200/_cluster/state'

It might be large and will contain information about your cluster that you
might not want to share publicly (index mappings). If this is the case,
please feel free to send it to me by email.

Igor

On Saturday, May 24, 2014 2:18:14 PM UTC-4, Andrew Vos wrote:

1.0.0. What do you mean by state exactly?

On Sat, May 24, 2014 at 6:33 PM, Igor Motov imotov@gmail.com wrote:

Which version of elsticsearch are you using? Can you send me the current
cluster state?

On Saturday, May 24, 2014 10:17:43 AM UTC-4, Andrew Vos wrote:

A few days ago I started a snapshot, but instead of using a shared
network I used the local filesystem. Because my root partition only had 8gb
(and this is where I stored the snapshots) the partition got filled up and
three of my seven elasticsearch boxes crashed almost instantly.

I've since created a new cluster and let the data replicate over. The
problem now is, I can't seem to cancel this snapshot!

Looking at the snapshot:
curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2
http://172.17.0.2:9300]][cluster/snapshot/get]]; nested:
SnapshotMissingException[[production_backup:_snapshot1] is missing];
nested: FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No
such file or directory)]; ","status":404}% *

Starting a new snapshot:
curl -XPUT
"localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"

*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2
http://172.17.0.2:9300]][cluster/snapshot/create]]; nested:
ConcurrentSnapshotExecutionException[[production_backup:snapshot_1] a
snapshot is already running]; ","status":503}% *

This just never completes:

curl -XDELETE "localhost:9999/_snapshot/production_backup/snapshot_1"

Any idea how I can solve this?

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e292c3d2-b0bf-438a-986b-e32db3f2dd7a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Igor Motov) #6

It was caused by this bug

On Saturday, May 24, 2014 2:30:06 PM UTC-4, Andrew Vos wrote:

Right ok here's the cluster state
https://gist.github.com/AndrewVos/29de3c6735bbd7808a81

On Sat, May 24, 2014 at 7:18 PM, Andrew Vos andrew.vos@gmail.com wrote:

1.0.0. What do you mean by state exactly?

On Sat, May 24, 2014 at 6:33 PM, Igor Motov imotov@gmail.com wrote:

Which version of elsticsearch are you using? Can you send me the current
cluster state?

On Saturday, May 24, 2014 10:17:43 AM UTC-4, Andrew Vos wrote:

A few days ago I started a snapshot, but instead of using a shared
network I used the local filesystem. Because my root partition only had 8gb
(and this is where I stored the snapshots) the partition got filled up and
three of my seven elasticsearch boxes crashed almost instantly.

I've since created a new cluster and let the data replicate over. The
problem now is, I can't seem to cancel this snapshot!

Looking at the snapshot:
curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2
http://172.17.0.2:9300]][cluster/snapshot/get]]; nested:
SnapshotMissingException[[production_backup:_snapshot1] is missing];
nested: FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No
such file or directory)]; ","status":404}% *

Starting a new snapshot:
curl -XPUT
"localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"

*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2
http://172.17.0.2:9300]][cluster/snapshot/create]]; nested:
ConcurrentSnapshotExecutionException[[production_backup:snapshot_1] a
snapshot is already running]; ","status":503}% *

This just never completes:

curl -XDELETE "localhost:9999/_snapshot/production_backup/snapshot_1"

Any idea how I can solve this?

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9573402c-d7fe-419d-866f-cff1196fcfc4%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Andrew Vos) #7

Ok. While you're here, one other question I would like answered:

I have 10 nodes in a cluster. I want to break out three nodes into a
different cluster as a kind of backup to test out this full cluster
restart. Would it be safe to just block the other three nodes from
connecting to the main cluster? Would they form their own?

On Sat, May 24, 2014 at 7:35 PM, Igor Motov imotov@gmail.com wrote:

It was caused by this bug -
https://github.com/elasticsearch/elasticsearch/issues/5958 The only
recovery option right now is full cluster restart.

On Saturday, May 24, 2014 2:30:06 PM UTC-4, Andrew Vos wrote:

Right ok here's the cluster state https://gist.github.com/
AndrewVos/29de3c6735bbd7808a81

On Sat, May 24, 2014 at 7:18 PM, Andrew Vos andrew.vos@gmail.com wrote:

1.0.0. What do you mean by state exactly?

On Sat, May 24, 2014 at 6:33 PM, Igor Motov imotov@gmail.com wrote:

Which version of elsticsearch are you using? Can you send me the
current cluster state?

On Saturday, May 24, 2014 10:17:43 AM UTC-4, Andrew Vos wrote:

A few days ago I started a snapshot, but instead of using a shared
network I used the local filesystem. Because my root partition only had 8gb
(and this is where I stored the snapshots) the partition got filled up and
three of my seven elasticsearch boxes crashed almost instantly.

I've since created a new cluster and let the data replicate over. The
problem now is, I can't seem to cancel this snapshot!

Looking at the snapshot:
curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2
http://172.17.0.2:9300]][cluster/snapshot/get]]; nested:
SnapshotMissingException[[production_backup:_snapshot1] is missing];
nested: FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No
such file or directory)]; ","status":404}% *

Starting a new snapshot:
curl -XPUT
"localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"

*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2
http://172.17.0.2:9300]][cluster/snapshot/create]]; nested:
ConcurrentSnapshotExecutionException[[production_backup:snapshot_1] a
snapshot is already running]; ","status":503}% *

This just never completes:

curl -XDELETE "localhost:9999/_snapshot/production_backup/snapshot_1"

Any idea how I can solve this?

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/rtffJxkKyzg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9573402c-d7fe-419d-866f-cff1196fcfc4%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/9573402c-d7fe-419d-866f-cff1196fcfc4%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAE64QtGZ2SKt3SLOXSS1D52jgpHXHNM2E0g8B9qtNWgraSWUuw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Igor Motov) #8

If your cluster is setup correctly (with proper value set
for discovery.zen.minimum_master_nodes) they shouldn't. But if you are
running without discovery.zen.minimum_master_nodes set, they might indeed
form a new cluster. Obviously some shards might end up in one cluster and
not in the other and if you are indexing while this is happening you will
lose some data. I would say it's pretty..... extreme way to test full
cluster restart.

On Saturday, May 24, 2014 2:38:44 PM UTC-4, Andrew Vos wrote:

Ok. While you're here, one other question I would like answered:

I have 10 nodes in a cluster. I want to break out three nodes into a
different cluster as a kind of backup to test out this full cluster
restart. Would it be safe to just block the other three nodes from
connecting to the main cluster? Would they form their own?

On Sat, May 24, 2014 at 7:35 PM, Igor Motov imotov@gmail.com wrote:

It was caused by this bug -
https://github.com/elasticsearch/elasticsearch/issues/5958 The only
recovery option right now is full cluster restart.

On Saturday, May 24, 2014 2:30:06 PM UTC-4, Andrew Vos wrote:

Right ok here's the cluster state https://gist.github.com/
AndrewVos/29de3c6735bbd7808a81

On Sat, May 24, 2014 at 7:18 PM, Andrew Vos andrew.vos@gmail.comwrote:

1.0.0. What do you mean by state exactly?

On Sat, May 24, 2014 at 6:33 PM, Igor Motov imotov@gmail.com wrote:

Which version of elsticsearch are you using? Can you send me the
current cluster state?

On Saturday, May 24, 2014 10:17:43 AM UTC-4, Andrew Vos wrote:

A few days ago I started a snapshot, but instead of using a shared
network I used the local filesystem. Because my root partition only had 8gb
(and this is where I stored the snapshots) the partition got filled up and
three of my seven elasticsearch boxes crashed almost instantly.

I've since created a new cluster and let the data replicate over. The
problem now is, I can't seem to cancel this snapshot!

Looking at the snapshot:
curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2
http://172.17.0.2:9300]][cluster/snapshot/get]]; nested:
SnapshotMissingException[[production_backup:_snapshot1] is missing];
nested: FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No
such file or directory)]; ","status":404}% *

Starting a new snapshot:
curl -XPUT
"localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"

*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2
http://172.17.0.2:9300]][cluster/snapshot/create]]; nested:
ConcurrentSnapshotExecutionException[[production_backup:snapshot_1] a
snapshot is already running]; ","status":503}% *

This just never completes:

curl -XDELETE "localhost:9999/_snapshot/production_backup/snapshot_1"

Any idea how I can solve this?

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/rtffJxkKyzg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/9573402c-d7fe-419d-866f-cff1196fcfc4%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/9573402c-d7fe-419d-866f-cff1196fcfc4%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/ae997156-db75-4a2a-8a29-1435a50bb0f1%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Andrew Vos) #9

Well it's the only way I can do it without downtime. Unless of course by
"full cluster restart" you mean restarting one node at a time?

On Sat, May 24, 2014 at 7:51 PM, Igor Motov imotov@gmail.com wrote:

If your cluster is setup correctly (with proper value set
for discovery.zen.minimum_master_nodes) they shouldn't. But if you are
running without discovery.zen.minimum_master_nodes set, they might indeed
form a new cluster. Obviously some shards might end up in one cluster and
not in the other and if you are indexing while this is happening you will
lose some data. I would say it's pretty..... extreme way to test full
cluster restart.

On Saturday, May 24, 2014 2:38:44 PM UTC-4, Andrew Vos wrote:

Ok. While you're here, one other question I would like answered:

I have 10 nodes in a cluster. I want to break out three nodes into a
different cluster as a kind of backup to test out this full cluster
restart. Would it be safe to just block the other three nodes from
connecting to the main cluster? Would they form their own?

On Sat, May 24, 2014 at 7:35 PM, Igor Motov imotov@gmail.com wrote:

It was caused by this bug - https://github.com/
elasticsearch/elasticsearch/issues/5958 The only recovery option right
now is full cluster restart.

On Saturday, May 24, 2014 2:30:06 PM UTC-4, Andrew Vos wrote:

Right ok here's the cluster state https://gist.github.com/
AndrewVos/29de3c6735bbd7808a81

On Sat, May 24, 2014 at 7:18 PM, Andrew Vos andrew.vos@gmail.comwrote:

1.0.0. What do you mean by state exactly?

On Sat, May 24, 2014 at 6:33 PM, Igor Motov imotov@gmail.com wrote:

Which version of elsticsearch are you using? Can you send me the
current cluster state?

On Saturday, May 24, 2014 10:17:43 AM UTC-4, Andrew Vos wrote:

A few days ago I started a snapshot, but instead of using a shared
network I used the local filesystem. Because my root partition only had 8gb
(and this is where I stored the snapshots) the partition got filled up and
three of my seven elasticsearch boxes crashed almost instantly.

I've since created a new cluster and let the data replicate over.
The problem now is, I can't seem to cancel this snapshot!

Looking at the snapshot:
curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2
http://172.17.0.2:9300]][cluster/snapshot/get]]; nested:
SnapshotMissingException[[production_backup:_snapshot1] is missing];
nested: FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No
such file or directory)]; ","status":404}% *

Starting a new snapshot:
curl -XPUT
"localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"

*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2
http://172.17.0.2:9300]][cluster/snapshot/create]]; nested:
ConcurrentSnapshotExecutionException[[production_backup:snapshot_1] a
snapshot is already running]; ","status":503}% *

This just never completes:

curl -XDELETE "localhost:9999/_snapshot/prod
uction_backup/snapshot_1"

Any idea how I can solve this?

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/to
pic/elasticsearch/rtffJxkKyzg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40goo
glegroups.comhttps://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/rtffJxkKyzg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/9573402c-d7fe-419d-866f-cff1196fcfc4%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/9573402c-d7fe-419d-866f-cff1196fcfc4%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ae997156-db75-4a2a-8a29-1435a50bb0f1%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/ae997156-db75-4a2a-8a29-1435a50bb0f1%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAE64QtGmnSrz7AM-Ad_gkg9UrGCfOPPaMuB57UGOUTP6RUnLSQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Igor Motov) #10

Yes, by "full cluster restart" I meant shutting down all nodes and then
starting them up again, which means downtime. However, after thinking about
the issue over the long weekend, I wrote a simple utility that cleans up
snapshots without need to restart the cluster

On Saturday, May 24, 2014 2:53:27 PM UTC-4, Andrew Vos wrote:

Well it's the only way I can do it without downtime. Unless of course by
"full cluster restart" you mean restarting one node at a time?

On Sat, May 24, 2014 at 7:51 PM, Igor Motov imotov@gmail.com wrote:

If your cluster is setup correctly (with proper value set
for discovery.zen.minimum_master_nodes) they shouldn't. But if you are
running without discovery.zen.minimum_master_nodes set, they might indeed
form a new cluster. Obviously some shards might end up in one cluster and
not in the other and if you are indexing while this is happening you will
lose some data. I would say it's pretty..... extreme way to test full
cluster restart.

On Saturday, May 24, 2014 2:38:44 PM UTC-4, Andrew Vos wrote:

Ok. While you're here, one other question I would like answered:

I have 10 nodes in a cluster. I want to break out three nodes into a
different cluster as a kind of backup to test out this full cluster
restart. Would it be safe to just block the other three nodes from
connecting to the main cluster? Would they form their own?

On Sat, May 24, 2014 at 7:35 PM, Igor Motov imotov@gmail.com wrote:

It was caused by this bug - https://github.com/
elasticsearch/elasticsearch/issues/5958 The only recovery option right
now is full cluster restart.

On Saturday, May 24, 2014 2:30:06 PM UTC-4, Andrew Vos wrote:

Right ok here's the cluster state https://gist.github.com/
AndrewVos/29de3c6735bbd7808a81

On Sat, May 24, 2014 at 7:18 PM, Andrew Vos andrew.vos@gmail.comwrote:

1.0.0. What do you mean by state exactly?

On Sat, May 24, 2014 at 6:33 PM, Igor Motov imotov@gmail.com wrote:

Which version of elsticsearch are you using? Can you send me the
current cluster state?

On Saturday, May 24, 2014 10:17:43 AM UTC-4, Andrew Vos wrote:

A few days ago I started a snapshot, but instead of using a shared
network I used the local filesystem. Because my root partition only had 8gb
(and this is where I stored the snapshots) the partition got filled up and
three of my seven elasticsearch boxes crashed almost instantly.

I've since created a new cluster and let the data replicate over.
The problem now is, I can't seem to cancel this snapshot!

Looking at the snapshot:
curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2
http://172.17.0.2:9300]][cluster/snapshot/get]]; nested:
SnapshotMissingException[[production_backup:_snapshot1] is missing];
nested: FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No
such file or directory)]; ","status":404}% *

Starting a new snapshot:
curl -XPUT
"localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"

*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2
http://172.17.0.2:9300]][cluster/snapshot/create]]; nested:
ConcurrentSnapshotExecutionException[[production_backup:snapshot_1] a
snapshot is already running]; ","status":503}% *

This just never completes:

curl -XDELETE "localhost:9999/_snapshot/prod
uction_backup/snapshot_1"

Any idea how I can solve this?

--
You received this message because you are subscribed to a topic in
the Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/to
pic/elasticsearch/rtffJxkKyzg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/41136f07-3af
d-4452-87e4-a54f983db539%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/41136f07-3afd-4452-87e4-a54f983db539%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/
topic/elasticsearch/rtffJxkKyzg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/9573402c-d7fe-419d-866f-cff1196fcfc4%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/9573402c-d7fe-419d-866f-cff1196fcfc4%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/rtffJxkKyzg/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/ae997156-db75-4a2a-8a29-1435a50bb0f1%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/ae997156-db75-4a2a-8a29-1435a50bb0f1%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/97b8d2f5-078f-4a24-b5a1-b97c9b61b87f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #11