Cluster crashed


(Michael Rennecke) #1

hello,

I have a the following problem. My network crashed in the night and one
node died. The Cluster has 11 nodes and 3 replicas. I can't start the
Elastic Search with all shards. Many shards are unassign. I can see
shards on the disks. Can I force, that ElasticSearch use these shards?

regards
Michael

--

  • Michael Rennecke *
    Junior Systemarchitekt, Semantic Web Project, IT

Unister Holding GmbH
BarfußgÀsschen 11 | 04109 Leipzig

Telefon: +49 (0)341 355381 25291
michael.rennecke@unister-gmbh.de mailto:michael.rennecke@unister-gmbh.de
www.unister.de http://www.unister.de/

Vertretungsberechtigter GeschÀftsfÌhrer: Thomas Wagner
Amtsgericht Leipzig, HRB: 25007


(Shay Banon) #2

You need to figure out which nodes see each other, is it one node that got
disconnected? Simplest way to bring things back up is to shutdown and start
the cluster. Otherwise, you need to figure out which nodes got disconnected,
and restart them.

Which version are you using? To better handle this, I highly recommend
setting the discovery.zen.minimum_master_nodes setting (and use latest
version). For an 11 node cluster, set it to something like 3 or 4.

On Mon, Sep 19, 2011 at 11:49 AM, Michael Rennecke <
michael.rennecke@unister-gmbh.de> wrote:

hello,

I have a the following problem. My network crashed in the night and one
node died. The Cluster has 11 nodes and 3 replicas. I can't start the
Elastic Search with all shards. Many shards are unassign. I can see
shards on the disks. Can I force, that ElasticSearch use these shards?

regards
Michael

--

  • Michael Rennecke *
    Junior Systemarchitekt, Semantic Web Project, IT

Unister Holding GmbH
Barfußgässchen 11 | 04109 Leipzig

Telefon: +49 (0)341 355381 25291
michael.rennecke@unister-gmbh.de mailto:michael.rennecke@unister-gmbh.de
www.unister.de http://www.unister.de/

Vertretungsberechtigter Geschäftsführer: Thomas Wagner
Amtsgericht Leipzig, HRB: 25007


(Michael Rennecke) #3

I use the 0.17.6 of elsatic search. The restart of the complete cluster
don't work :frowning: I set discovery.zen.minimum_master_nodes to 3

ElasticSearch head see all nodes, with curl i can also see all nodes.

Am 19.09.2011 10:59, schrieb Shay Banon:

You need to figure out which nodes see each other, is it one node that got
disconnected? Simplest way to bring things back up is to shutdown and start
the cluster. Otherwise, you need to figure out which nodes got disconnected,
and restart them.

Which version are you using? To better handle this, I highly recommend
setting the discovery.zen.minimum_master_nodes setting (and use latest
version). For an 11 node cluster, set it to something like 3 or 4.

On Mon, Sep 19, 2011 at 11:49 AM, Michael Rennecke <
michael.rennecke@unister-gmbh.de> wrote:

hello,

I have a the following problem. My network crashed in the night and one
node died. The Cluster has 11 nodes and 3 replicas. I can't start the
Elastic Search with all shards. Many shards are unassign. I can see
shards on the disks. Can I force, that ElasticSearch use these shards?

regards
Michael

--

  • Michael Rennecke *
    Junior Systemarchitekt, Semantic Web Project, IT

Unister Holding GmbH
Barfußgässchen 11 | 04109 Leipzig

Telefon: +49 (0)341 355381 25291
michael.rennecke@unister-gmbh.de mailto:michael.rennecke@unister-gmbh.de
www.unister.de http://www.unister.de/

Vertretungsberechtigter Geschäftsführer: Thomas Wagner
Amtsgericht Leipzig, HRB: 25007

--

  • Michael Rennecke *
    Junior Systemarchitekt, Semantic Web Project, IT

Unister Holding GmbH
Barfußgässchen 11 | 04109 Leipzig

Telefon: +49 (0)341 355381 25291
michael.rennecke@unister-gmbh.de mailto:michael.rennecke@unister-gmbh.de
www.unister.de http://www.unister.de/

Vertretungsberechtigter Geschäftsführer: Thomas Wagner
Amtsgericht Leipzig, HRB: 25007


(Shay Banon) #4

What do you mean it did not work? Is the cluster in RED health after the
restart? If so, you can understand a bit why by setting logging
on gateway.local to DEBUG. Try and restart the cluster and
set index.recovery.initial_shards setting to 1, see if it helps (make sure
to set those settings on all nodes, though it is really needed on the
elected master of the cluster).

-shay.banon

On Mon, Sep 19, 2011 at 12:56 PM, Michael Rennecke <
michael.rennecke@unister-gmbh.de> wrote:

I use the 0.17.6 of elsatic search. The restart of the complete cluster
don't work :frowning: I set discovery.zen.minimum_master_nodes to 3

ElasticSearch head see all nodes, with curl i can also see all nodes.

Am 19.09.2011 10:59, schrieb Shay Banon:

You need to figure out which nodes see each other, is it one node that
got
disconnected? Simplest way to bring things back up is to shutdown and
start
the cluster. Otherwise, you need to figure out which nodes got
disconnected,
and restart them.

Which version are you using? To better handle this, I highly recommend
setting the discovery.zen.minimum_master_nodes setting (and use latest
version). For an 11 node cluster, set it to something like 3 or 4.

On Mon, Sep 19, 2011 at 11:49 AM, Michael Rennecke <
michael.rennecke@unister-gmbh.de> wrote:

hello,

I have a the following problem. My network crashed in the night and one
node died. The Cluster has 11 nodes and 3 replicas. I can't start the
Elastic Search with all shards. Many shards are unassign. I can see
shards on the disks. Can I force, that ElasticSearch use these shards?

regards
Michael

--

  • Michael Rennecke *
    Junior Systemarchitekt, Semantic Web Project, IT

Unister Holding GmbH
Barfußgässchen 11 | 04109 Leipzig

Telefon: +49 (0)341 355381 25291
michael.rennecke@unister-gmbh.de <mailto:
michael.rennecke@unister-gmbh.de>

www.unister.de http://www.unister.de/

Vertretungsberechtigter Geschäftsführer: Thomas Wagner
Amtsgericht Leipzig, HRB: 25007

--

  • Michael Rennecke *
    Junior Systemarchitekt, Semantic Web Project, IT

Unister Holding GmbH
Barfußgässchen 11 | 04109 Leipzig

Telefon: +49 (0)341 355381 25291
michael.rennecke@unister-gmbh.de mailto:michael.rennecke@unister-gmbh.de
www.unister.de http://www.unister.de/

Vertretungsberechtigter Geschäftsführer: Thomas Wagner
Amtsgericht Leipzig, HRB: 25007


(Michael Rennecke) #5

The cluster was in state RED after the restart.

I add gateway.local.initial_shards: 1 I found this in an old posting
(http://elasticsearch-users.115913.n3.nabble.com/Data-lost-after-full-cluster-restart-td3185255.html)
The cluster become yellow and found all shards.

Thanks for your help.

regards
michael

Am 19.09.2011 15:53, schrieb Shay Banon:

What do you mean it did not work? Is the cluster in RED health after the
restart? If so, you can understand a bit why by setting logging
on gateway.local to DEBUG. Try and restart the cluster and
set index.recovery.initial_shards setting to 1, see if it helps (make sure
to set those settings on all nodes, though it is really needed on the
elected master of the cluster).

-shay.banon

On Mon, Sep 19, 2011 at 12:56 PM, Michael Rennecke <
michael.rennecke@unister-gmbh.de> wrote:

I use the 0.17.6 of elsatic search. The restart of the complete cluster
don't work :frowning: I set discovery.zen.minimum_master_nodes to 3

ElasticSearch head see all nodes, with curl i can also see all nodes.

Am 19.09.2011 10:59, schrieb Shay Banon:

You need to figure out which nodes see each other, is it one node that
got
disconnected? Simplest way to bring things back up is to shutdown and
start
the cluster. Otherwise, you need to figure out which nodes got
disconnected,
and restart them.

Which version are you using? To better handle this, I highly recommend
setting the discovery.zen.minimum_master_nodes setting (and use latest
version). For an 11 node cluster, set it to something like 3 or 4.

On Mon, Sep 19, 2011 at 11:49 AM, Michael Rennecke <
michael.rennecke@unister-gmbh.de> wrote:

hello,

I have a the following problem. My network crashed in the night and one
node died. The Cluster has 11 nodes and 3 replicas. I can't start the
Elastic Search with all shards. Many shards are unassign. I can see
shards on the disks. Can I force, that ElasticSearch use these shards?

regards
Michael

--

  • Michael Rennecke *
    Junior Systemarchitekt, Semantic Web Project, IT

Unister Holding GmbH
Barfußgässchen 11 | 04109 Leipzig

Telefon: +49 (0)341 355381 25291
michael.rennecke@unister-gmbh.de <mailto:
michael.rennecke@unister-gmbh.de>

www.unister.de http://www.unister.de/

Vertretungsberechtigter Geschäftsführer: Thomas Wagner
Amtsgericht Leipzig, HRB: 25007

--

  • Michael Rennecke *
    Junior Systemarchitekt, Semantic Web Project, IT

Unister Holding GmbH
Barfußgässchen 11 | 04109 Leipzig

Telefon: +49 (0)341 355381 25291
michael.rennecke@unister-gmbh.de mailto:michael.rennecke@unister-gmbh.de
www.unister.de http://www.unister.de/

Vertretungsberechtigter Geschäftsführer: Thomas Wagner
Amtsgericht Leipzig, HRB: 25007

--

  • Michael Rennecke *
    Junior Systemarchitekt, Semantic Web Project, IT

Unister Holding GmbH
Barfußgässchen 11 | 04109 Leipzig

Telefon: +49 (0)341 355381 25291
michael.rennecke@unister-gmbh.de mailto:michael.rennecke@unister-gmbh.de
www.unister.de http://www.unister.de/

Vertretungsberechtigter Geschäftsführer: Thomas Wagner
Amtsgericht Leipzig, HRB: 25007


(vpunski) #6

Seems like the problem is the same I had before (the link mentioned a
thread I started before)
I've found another problem that has the same source, but much more
problematic consequences
https://groups.google.com/group/elasticsearch/browse_thread/thread/93f5c9837275ec1f/7a766c37b0f9ae33?lnk=gst&q=initial_shards#7a766c37b0f9ae33

From my point of view, the system lacks consistency check on startup,
besides "initial_shards" parameter.
In case initial_shards parameter is ok, but the content is broken, the
data replicated during startup process to other nodes, and the whole
data state is unpredictable...
In case no RAID used (and it shouldn't, with already existing
replication logic on application level), I'd like to have all the data
checksummed on startup, and choose the shard not only by it's last
update time, but also on its' checksum value.
Any ideas regarding this behaviour?
Should I open a new request for that?

Thanks

On Sep 19, 4:11 pm, Michael Rennecke <michael.renne...@unister-
gmbh.de> wrote:

The cluster was in state RED after the restart.

I add gateway.local.initial_shards: 1 I found this in an old posting
(http://elasticsearch-users.115913.n3.nabble.com/Data-lost-after-full-...)
The cluster become yellow and found all shards.

Thanks for your help.

regards
michael

Am 19.09.2011 15:53, schrieb Shay Banon:

What do you mean it did not work? Is the cluster in RED health after the
restart? If so, you can understand a bit why by setting logging
on gateway.local to DEBUG. Try and restart the cluster and
set index.recovery.initial_shardssetting to 1, see if it helps (make sure
to set those settings on all nodes, though it is really needed on the
elected master of the cluster).

-shay.banon

On Mon, Sep 19, 2011 at 12:56 PM, Michael Rennecke <
michael.renne...@unister-gmbh.de> wrote:

I use the 0.17.6 of elsatic search. The restart of the complete cluster
don't work :frowning: I set discovery.zen.minimum_master_nodes to 3

ElasticSearch head see all nodes, with curl i can also see all nodes.

Am 19.09.2011 10:59, schrieb Shay Banon:

You need to figure out which nodes see each other, is it one node that
got
disconnected? Simplest way to bring things back up is to shutdown and
start
the cluster. Otherwise, you need to figure out which nodes got
disconnected,
and restart them.

Which version are you using? To better handle this, I highly recommend
setting the discovery.zen.minimum_master_nodes setting (and use latest
version). For an 11 node cluster, set it to something like 3 or 4.

On Mon, Sep 19, 2011 at 11:49 AM, Michael Rennecke <
michael.renne...@unister-gmbh.de> wrote:

hello,

I have a the following problem. My network crashed in the night and one
node died. The Cluster has 11 nodes and 3 replicas. I can't start the
Elastic Search with all shards. Many shards are unassign. I can see
shards on the disks. Can I force, that ElasticSearch use these shards?

regards
Michael

--

  • Michael Rennecke *
    Junior Systemarchitekt, Semantic Web Project, IT

Unister Holding GmbH
Barfu g sschen 11 | 04109 Leipzig

Telefon: +49 (0)341 355381 25291
michael.renne...@unister-gmbh.de <mailto:
michael.renne...@unister-gmbh.de>

www.unister.dehttp://www.unister.de/

Vertretungsberechtigter Gesch ftsf hrer: Thomas Wagner
Amtsgericht Leipzig, HRB: 25007

--

  • Michael Rennecke *
    Junior Systemarchitekt, Semantic Web Project, IT

Unister Holding GmbH
Barfu g sschen 11 | 04109 Leipzig

Telefon: +49 (0)341 355381 25291
michael.renne...@unister-gmbh.de mailto:michael.renne...@unister-gmbh.de
www.unister.dehttp://www.unister.de/

Vertretungsberechtigter Gesch ftsf hrer: Thomas Wagner
Amtsgericht Leipzig, HRB: 25007

--

  • Michael Rennecke *
    Junior Systemarchitekt, Semantic Web Project, IT

Unister Holding GmbH
Barfu g sschen 11 | 04109 Leipzig

Telefon: +49 (0)341 355381 25291
michael.renne...@unister-gmbh.de mailto:michael.renne...@unister-gmbh.dewww.unister.dehttp://www.unister.de/

Vertretungsberechtigter Gesch ftsf hrer: Thomas Wagner
Amtsgericht Leipzig, HRB: 25007


(Shay Banon) #7

You can enable check on the index a shard uses on startup by
setting: index.shard.check_on_startup to true. The problem is that this can
be expensive....

On Sun, Oct 30, 2011 at 2:52 PM, vadim vpunski@gmail.com wrote:

Seems like the problem is the same I had before (the link mentioned a
thread I started before)
I've found another problem that has the same source, but much more
problematic consequences

https://groups.google.com/group/elasticsearch/browse_thread/thread/93f5c9837275ec1f/7a766c37b0f9ae33?lnk=gst&q=initial_shards#7a766c37b0f9ae33

From my point of view, the system lacks consistency check on startup,
besides "initial_shards" parameter.
In case initial_shards parameter is ok, but the content is broken, the
data replicated during startup process to other nodes, and the whole
data state is unpredictable...
In case no RAID used (and it shouldn't, with already existing
replication logic on application level), I'd like to have all the data
checksummed on startup, and choose the shard not only by it's last
update time, but also on its' checksum value.
Any ideas regarding this behaviour?
Should I open a new request for that?

Thanks

On Sep 19, 4:11 pm, Michael Rennecke <michael.renne...@unister-
gmbh.de> wrote:

The cluster was in state RED after the restart.

I add gateway.local.initial_shards: 1 I found this in an old posting
(http://elasticsearch-users.115913.n3.nabble.com/Data-lost-after-full-..
.)
The cluster become yellow and found all shards.

Thanks for your help.

regards
michael

Am 19.09.2011 15:53, schrieb Shay Banon:

What do you mean it did not work? Is the cluster in RED health after
the

restart? If so, you can understand a bit why by setting logging
on gateway.local to DEBUG. Try and restart the cluster and
set index.recovery.initial_shardssetting to 1, see if it helps (make
sure

to set those settings on all nodes, though it is really needed on the
elected master of the cluster).

-shay.banon

On Mon, Sep 19, 2011 at 12:56 PM, Michael Rennecke <
michael.renne...@unister-gmbh.de> wrote:

I use the 0.17.6 of elsatic search. The restart of the complete
cluster

don't work :frowning: I set discovery.zen.minimum_master_nodes to 3

ElasticSearch head see all nodes, with curl i can also see all nodes.

Am 19.09.2011 10:59, schrieb Shay Banon:

You need to figure out which nodes see each other, is it one node
that

got

disconnected? Simplest way to bring things back up is to shutdown and
start
the cluster. Otherwise, you need to figure out which nodes got
disconnected,
and restart them.

Which version are you using? To better handle this, I highly
recommend

setting the discovery.zen.minimum_master_nodes setting (and use
latest

version). For an 11 node cluster, set it to something like 3 or 4.

On Mon, Sep 19, 2011 at 11:49 AM, Michael Rennecke <
michael.renne...@unister-gmbh.de> wrote:

hello,

I have a the following problem. My network crashed in the night and
one

node died. The Cluster has 11 nodes and 3 replicas. I can't start
the

Elastic Search with all shards. Many shards are unassign. I can see
shards on the disks. Can I force, that ElasticSearch use these
shards?

regards
Michael

--

  • Michael Rennecke *
    Junior Systemarchitekt, Semantic Web Project, IT

Unister Holding GmbH
Barfu g sschen 11 | 04109 Leipzig

Telefon: +49 (0)341 355381 25291
michael.renne...@unister-gmbh.de <mailto:
michael.renne...@unister-gmbh.de>

www.unister.dehttp://www.unister.de/

Vertretungsberechtigter Gesch ftsf hrer: Thomas Wagner
Amtsgericht Leipzig, HRB: 25007

--

  • Michael Rennecke *
    Junior Systemarchitekt, Semantic Web Project, IT

Unister Holding GmbH
Barfu g sschen 11 | 04109 Leipzig

Telefon: +49 (0)341 355381 25291
michael.renne...@unister-gmbh.de <mailto:
michael.renne...@unister-gmbh.de>

www.unister.dehttp://www.unister.de/

Vertretungsberechtigter Gesch ftsf hrer: Thomas Wagner
Amtsgericht Leipzig, HRB: 25007

--

  • Michael Rennecke *
    Junior Systemarchitekt, Semantic Web Project, IT

Unister Holding GmbH
Barfu g sschen 11 | 04109 Leipzig

Telefon: +49 (0)341 355381 25291
michael.renne...@unister-gmbh.de <mailto:
michael.renne...@unister-gmbh.de>www.unister.dehttp://www.unister.de/

Vertretungsberechtigter Gesch ftsf hrer: Thomas Wagner
Amtsgericht Leipzig, HRB: 25007


(vpunski) #8

Very cool feature!
After some code browsing I didn't find index.shard.check_on_startup
parameter.
Seems to me its' name index.shard.check_index, defined in constructor
of
org.elasticsearch.index.shard.service.InternalIndexShard
Am I right?

I have one question remained... Already tried to deep dive into the
code, but get lost.
In case the parameter set to true, and the consistency check of
particular shared failed,
what is the logic/behaviour of the system from shards recovery/
relocating perspective?

Very helpful answer,
Thanks


(Shay Banon) #9

On Tue, Nov 1, 2011 at 12:26 PM, vadim vpunski@gmail.com wrote:

Very cool feature!
After some code browsing I didn't find index.shard.check_on_startup
parameter.
Seems to me its' name index.shard.check_index, defined in constructor
of
org.elasticsearch.index.shard.service.InternalIndexShard
Am I right?

I renamed it in 0.18 (it was not a public setting, but it should be
documented in 0.18).

I have one question remained... Already tried to deep dive into the
code, but get lost.
In case the parameter set to true, and the consistency check of
particular shared failed,
what is the logic/behaviour of the system from shards recovery/
relocating perspective?

It will mark the shard as failed, and will try and allocate it to another
node.

Very helpful answer,
Thanks


(system) #10