A few days ago I started a snapshot, but instead of using a shared network
I used the local filesystem. Because my root partition only had 8gb (and
this is where I stored the snapshots) the partition got filled up and three
of my seven elasticsearch boxes crashed almost instantly.
I've since created a new cluster and let the data replicate over. The
problem now is, I can't seem to cancel this snapshot!
Looking at the snapshot: curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2:9300]][cluster/snapshot/get]];
nested: SnapshotMissingException[[production_backup:_snapshot1] is
missing]; nested:
FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No such
file or directory)]; ","status":404}% *
Starting a new snapshot: curl -XPUT
"localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2:9300]][cluster/snapshot/create]];
nested: ConcurrentSnapshotExecutionException[[production_backup:snapshot_1]
a snapshot is already running]; ","status":503}% *
Which version of elsticsearch are you using? Can you send me the current
cluster state?
On Saturday, May 24, 2014 10:17:43 AM UTC-4, Andrew Vos wrote:
A few days ago I started a snapshot, but instead of using a shared network
I used the local filesystem. Because my root partition only had 8gb (and
this is where I stored the snapshots) the partition got filled up and three
of my seven elasticsearch boxes crashed almost instantly.
I've since created a new cluster and let the data replicate over. The
problem now is, I can't seem to cancel this snapshot!
Looking at the snapshot: curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2:9300]][cluster/snapshot/get]];
nested: SnapshotMissingException[[production_backup:_snapshot1] is
missing]; nested:
FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No such
file or directory)]; ","status":404}% *
Starting a new snapshot: curl -XPUT
"localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2:9300]][cluster/snapshot/create]];
nested: ConcurrentSnapshotExecutionException[[production_backup:snapshot_1]
a snapshot is already running]; ","status":503}% *
On Sat, May 24, 2014 at 6:33 PM, Igor Motov imotov@gmail.com wrote:
Which version of elsticsearch are you using? Can you send me the current
cluster state?
On Saturday, May 24, 2014 10:17:43 AM UTC-4, Andrew Vos wrote:
A few days ago I started a snapshot, but instead of using a shared
network I used the local filesystem. Because my root partition only had 8gb
(and this is where I stored the snapshots) the partition got filled up and
three of my seven elasticsearch boxes crashed almost instantly.
I've since created a new cluster and let the data replicate over. The
problem now is, I can't seem to cancel this snapshot!
Looking at the snapshot: curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2 http://172.17.0.2:9300]][cluster/snapshot/get]]; nested:
SnapshotMissingException[[production_backup:_snapshot1] is missing];
nested: FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No
such file or directory)]; ","status":404}% *
Starting a new snapshot: curl -XPUT
"localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2 http://172.17.0.2:9300]][cluster/snapshot/create]]; nested:
ConcurrentSnapshotExecutionException[[production_backup:snapshot_1] a
snapshot is already running]; ","status":503}% *
On Sat, May 24, 2014 at 6:33 PM, Igor Motov imotov@gmail.com wrote:
Which version of elsticsearch are you using? Can you send me the current
cluster state?
On Saturday, May 24, 2014 10:17:43 AM UTC-4, Andrew Vos wrote:
A few days ago I started a snapshot, but instead of using a shared
network I used the local filesystem. Because my root partition only had 8gb
(and this is where I stored the snapshots) the partition got filled up and
three of my seven elasticsearch boxes crashed almost instantly.
I've since created a new cluster and let the data replicate over. The
problem now is, I can't seem to cancel this snapshot!
Looking at the snapshot: curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2 http://172.17.0.2:9300]][cluster/snapshot/get]]; nested:
SnapshotMissingException[[production_backup:_snapshot1] is missing];
nested: FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No
such file or directory)]; ","status":404}% *
Starting a new snapshot: curl -XPUT
"localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2 http://172.17.0.2:9300]][cluster/snapshot/create]]; nested:
ConcurrentSnapshotExecutionException[[production_backup:snapshot_1] a
snapshot is already running]; ","status":503}% *
It might be large and will contain information about your cluster that you
might not want to share publicly (index mappings). If this is the case,
please feel free to send it to me by email.
Igor
On Saturday, May 24, 2014 2:18:14 PM UTC-4, Andrew Vos wrote:
1.0.0. What do you mean by state exactly?
On Sat, May 24, 2014 at 6:33 PM, Igor Motov imotov@gmail.com wrote:
Which version of elsticsearch are you using? Can you send me the current
cluster state?
On Saturday, May 24, 2014 10:17:43 AM UTC-4, Andrew Vos wrote:
A few days ago I started a snapshot, but instead of using a shared
network I used the local filesystem. Because my root partition only had 8gb
(and this is where I stored the snapshots) the partition got filled up and
three of my seven elasticsearch boxes crashed almost instantly.
I've since created a new cluster and let the data replicate over. The
problem now is, I can't seem to cancel this snapshot!
Looking at the snapshot: curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2 http://172.17.0.2:9300]][cluster/snapshot/get]]; nested:
SnapshotMissingException[[production_backup:_snapshot1] is missing];
nested: FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No
such file or directory)]; ","status":404}% *
Starting a new snapshot: curl -XPUT
"localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2 http://172.17.0.2:9300]][cluster/snapshot/create]]; nested:
ConcurrentSnapshotExecutionException[[production_backup:snapshot_1] a
snapshot is already running]; ","status":503}% *
On Sat, May 24, 2014 at 6:33 PM, Igor Motov imotov@gmail.com wrote:
Which version of elsticsearch are you using? Can you send me the current
cluster state?
On Saturday, May 24, 2014 10:17:43 AM UTC-4, Andrew Vos wrote:
A few days ago I started a snapshot, but instead of using a shared
network I used the local filesystem. Because my root partition only had 8gb
(and this is where I stored the snapshots) the partition got filled up and
three of my seven elasticsearch boxes crashed almost instantly.
I've since created a new cluster and let the data replicate over. The
problem now is, I can't seem to cancel this snapshot!
Looking at the snapshot: curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2 http://172.17.0.2:9300]][cluster/snapshot/get]]; nested:
SnapshotMissingException[[production_backup:_snapshot1] is missing];
nested: FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No
such file or directory)]; ","status":404}% *
Starting a new snapshot: curl -XPUT
"localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2 http://172.17.0.2:9300]][cluster/snapshot/create]]; nested:
ConcurrentSnapshotExecutionException[[production_backup:snapshot_1] a
snapshot is already running]; ","status":503}% *
Ok. While you're here, one other question I would like answered:
I have 10 nodes in a cluster. I want to break out three nodes into a
different cluster as a kind of backup to test out this full cluster
restart. Would it be safe to just block the other three nodes from
connecting to the main cluster? Would they form their own?
On Sat, May 24, 2014 at 7:35 PM, Igor Motov imotov@gmail.com wrote:
On Sat, May 24, 2014 at 6:33 PM, Igor Motov imotov@gmail.com wrote:
Which version of elsticsearch are you using? Can you send me the
current cluster state?
On Saturday, May 24, 2014 10:17:43 AM UTC-4, Andrew Vos wrote:
A few days ago I started a snapshot, but instead of using a shared
network I used the local filesystem. Because my root partition only had 8gb
(and this is where I stored the snapshots) the partition got filled up and
three of my seven elasticsearch boxes crashed almost instantly.
I've since created a new cluster and let the data replicate over. The
problem now is, I can't seem to cancel this snapshot!
Looking at the snapshot: curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2 http://172.17.0.2:9300]][cluster/snapshot/get]]; nested:
SnapshotMissingException[[production_backup:_snapshot1] is missing];
nested: FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No
such file or directory)]; ","status":404}% *
Starting a new snapshot: curl -XPUT
"localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2 http://172.17.0.2:9300]][cluster/snapshot/create]]; nested:
ConcurrentSnapshotExecutionException[[production_backup:snapshot_1] a
snapshot is already running]; ","status":503}% *
If your cluster is setup correctly (with proper value set
for discovery.zen.minimum_master_nodes) they shouldn't. But if you are
running without discovery.zen.minimum_master_nodes set, they might indeed
form a new cluster. Obviously some shards might end up in one cluster and
not in the other and if you are indexing while this is happening you will
lose some data. I would say it's pretty..... extreme way to test full
cluster restart.
On Saturday, May 24, 2014 2:38:44 PM UTC-4, Andrew Vos wrote:
Ok. While you're here, one other question I would like answered:
I have 10 nodes in a cluster. I want to break out three nodes into a
different cluster as a kind of backup to test out this full cluster
restart. Would it be safe to just block the other three nodes from
connecting to the main cluster? Would they form their own?
On Sat, May 24, 2014 at 7:35 PM, Igor Motov imotov@gmail.com wrote:
On Sat, May 24, 2014 at 6:33 PM, Igor Motov imotov@gmail.com wrote:
Which version of elsticsearch are you using? Can you send me the
current cluster state?
On Saturday, May 24, 2014 10:17:43 AM UTC-4, Andrew Vos wrote:
A few days ago I started a snapshot, but instead of using a shared
network I used the local filesystem. Because my root partition only had 8gb
(and this is where I stored the snapshots) the partition got filled up and
three of my seven elasticsearch boxes crashed almost instantly.
I've since created a new cluster and let the data replicate over. The
problem now is, I can't seem to cancel this snapshot!
Looking at the snapshot: curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2 http://172.17.0.2:9300]][cluster/snapshot/get]]; nested:
SnapshotMissingException[[production_backup:_snapshot1] is missing];
nested: FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No
such file or directory)]; ","status":404}% *
Starting a new snapshot: curl -XPUT
"localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2 http://172.17.0.2:9300]][cluster/snapshot/create]]; nested:
ConcurrentSnapshotExecutionException[[production_backup:snapshot_1] a
snapshot is already running]; ","status":503}% *
Well it's the only way I can do it without downtime. Unless of course by
"full cluster restart" you mean restarting one node at a time?
On Sat, May 24, 2014 at 7:51 PM, Igor Motov imotov@gmail.com wrote:
If your cluster is setup correctly (with proper value set
for discovery.zen.minimum_master_nodes) they shouldn't. But if you are
running without discovery.zen.minimum_master_nodes set, they might indeed
form a new cluster. Obviously some shards might end up in one cluster and
not in the other and if you are indexing while this is happening you will
lose some data. I would say it's pretty..... extreme way to test full
cluster restart.
On Saturday, May 24, 2014 2:38:44 PM UTC-4, Andrew Vos wrote:
Ok. While you're here, one other question I would like answered:
I have 10 nodes in a cluster. I want to break out three nodes into a
different cluster as a kind of backup to test out this full cluster
restart. Would it be safe to just block the other three nodes from
connecting to the main cluster? Would they form their own?
On Sat, May 24, 2014 at 7:35 PM, Igor Motov imotov@gmail.com wrote:
It was caused by this bug - https://github.com/
elasticsearch/elasticsearch/issues/5958 The only recovery option right
now is full cluster restart.
On Saturday, May 24, 2014 2:30:06 PM UTC-4, Andrew Vos wrote:
On Sat, May 24, 2014 at 6:33 PM, Igor Motov imotov@gmail.com wrote:
Which version of elsticsearch are you using? Can you send me the
current cluster state?
On Saturday, May 24, 2014 10:17:43 AM UTC-4, Andrew Vos wrote:
A few days ago I started a snapshot, but instead of using a shared
network I used the local filesystem. Because my root partition only had 8gb
(and this is where I stored the snapshots) the partition got filled up and
three of my seven elasticsearch boxes crashed almost instantly.
I've since created a new cluster and let the data replicate over.
The problem now is, I can't seem to cancel this snapshot!
Looking at the snapshot: curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2 http://172.17.0.2:9300]][cluster/snapshot/get]]; nested:
SnapshotMissingException[[production_backup:_snapshot1] is missing];
nested: FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No
such file or directory)]; ","status":404}% *
Starting a new snapshot: curl -XPUT
"localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2 http://172.17.0.2:9300]][cluster/snapshot/create]]; nested:
ConcurrentSnapshotExecutionException[[production_backup:snapshot_1] a
snapshot is already running]; ","status":503}% *
Yes, by "full cluster restart" I meant shutting down all nodes and then
starting them up again, which means downtime. However, after thinking about
the issue over the long weekend, I wrote a simple utility that cleans up
snapshots without need to restart the cluster
On Saturday, May 24, 2014 2:53:27 PM UTC-4, Andrew Vos wrote:
Well it's the only way I can do it without downtime. Unless of course by
"full cluster restart" you mean restarting one node at a time?
On Sat, May 24, 2014 at 7:51 PM, Igor Motov imotov@gmail.com wrote:
If your cluster is setup correctly (with proper value set
for discovery.zen.minimum_master_nodes) they shouldn't. But if you are
running without discovery.zen.minimum_master_nodes set, they might indeed
form a new cluster. Obviously some shards might end up in one cluster and
not in the other and if you are indexing while this is happening you will
lose some data. I would say it's pretty..... extreme way to test full
cluster restart.
On Saturday, May 24, 2014 2:38:44 PM UTC-4, Andrew Vos wrote:
Ok. While you're here, one other question I would like answered:
I have 10 nodes in a cluster. I want to break out three nodes into a
different cluster as a kind of backup to test out this full cluster
restart. Would it be safe to just block the other three nodes from
connecting to the main cluster? Would they form their own?
On Sat, May 24, 2014 at 7:35 PM, Igor Motov imotov@gmail.com wrote:
It was caused by this bug - https://github.com/
elasticsearch/elasticsearch/issues/5958 The only recovery option right
now is full cluster restart.
On Saturday, May 24, 2014 2:30:06 PM UTC-4, Andrew Vos wrote:
On Sat, May 24, 2014 at 6:33 PM, Igor Motov imotov@gmail.com wrote:
Which version of elsticsearch are you using? Can you send me the
current cluster state?
On Saturday, May 24, 2014 10:17:43 AM UTC-4, Andrew Vos wrote:
A few days ago I started a snapshot, but instead of using a shared
network I used the local filesystem. Because my root partition only had 8gb
(and this is where I stored the snapshots) the partition got filled up and
three of my seven elasticsearch boxes crashed almost instantly.
I've since created a new cluster and let the data replicate over.
The problem now is, I can't seem to cancel this snapshot!
Looking at the snapshot: curl -XGET "localhost:9999/_snapshot/production_backup/_snapshot1"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2 http://172.17.0.2:9300]][cluster/snapshot/get]]; nested:
SnapshotMissingException[[production_backup:_snapshot1] is missing];
nested: FileNotFoundException[/ebs/snapshot-backup/snapshot-_snapshot1 (No
such file or directory)]; ","status":404}% *
Starting a new snapshot: curl -XPUT
"localhost:9999/_snapshot/production_backup/snapshot_1?wait_for_completion=false"
*{"error":"RemoteTransportException[[Smuggler][inet[/172.17.0.2 http://172.17.0.2:9300]][cluster/snapshot/create]]; nested:
ConcurrentSnapshotExecutionException[[production_backup:snapshot_1] a
snapshot is already running]; ","status":503}% *
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.