UNASSIGNED indexes

Hello,

Quick description of the problem: I had 1 elasticsearch server cluster with
0 replicas for all indexes, then added a second ES server to the cluster
and then set that all existing indexes should have 1 replica (instead of
the 0) .
After doing this, in the _cat/shards page I could see that it created twice
as many indexes/indexes(?) (sorry, im not that good with the ES
terminology) and all the new ones were UNASSIGNED. They then started slowly
INITIALIZING (2 at a time) and then go to STARTED state on the second
server. So thats good, but the problem is that this was done like 2 weeks
ago and some of them are still UNASSIGNED or INITIALIZING. I've even
noticed some of them going into the INITIALIZING state and then falling
back to UNASSIGNED. How can I properly find the problem here and is there
any way I can force the replicas to be allocated to the 2nd node WITHOUT
any data loss? I found only one way to allocate, but that includes data
loss, unfortunately.

I don't have tons of data in there so I dont see why it should take so
long. The largest index is ±500M in size, most of them are a few megs and
the total number is around ±150-200.

any help would be appreciated..

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/05cad3e5-78ce-44f7-9a30-6ca0eb817237%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Take a look in your ES logs, it should have something of use.

You can also try dropping the replicas to 0 for the indices that are having
these problems, then readd them one index at a time.

On 2 April 2015 at 23:42, Darius dariusr88@gmail.com wrote:

Hello,

Quick description of the problem: I had 1 elasticsearch server cluster
with 0 replicas for all indexes, then added a second ES server to the
cluster and then set that all existing indexes should have 1 replica
(instead of the 0) .
After doing this, in the _cat/shards page I could see that it created
twice as many indexes/indexes(?) (sorry, im not that good with the ES
terminology) and all the new ones were UNASSIGNED. They then started
slowly INITIALIZING (2 at a time) and then go to STARTED state on the
second server. So thats good, but the problem is that this was done like 2
weeks ago and some of them are still UNASSIGNED or INITIALIZING. I've even
noticed some of them going into the INITIALIZING state and then falling
back to UNASSIGNED. How can I properly find the problem here and is there
any way I can force the replicas to be allocated to the 2nd node WITHOUT
any data loss? I found only one way to allocate, but that includes data
loss, unfortunately.

I don't have tons of data in there so I dont see why it should take so
long. The largest index is ±500M in size, most of them are a few megs and
the total number is around ±150-200.

any help would be appreciated..

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/05cad3e5-78ce-44f7-9a30-6ca0eb817237%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/05cad3e5-78ce-44f7-9a30-6ca0eb817237%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8mYPddh6m1aGv60X1GVtjt%3Dq8VqzsTOAKmGj17TyYZAA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Noticed this happening on a cluster this week which had reached 85%, the
full disk watermark.

On Thursday, April 2, 2015 at 3:29:18 PM UTC-6, Mark Walkom wrote:

Take a look in your ES logs, it should have something of use.

You can also try dropping the replicas to 0 for the indices that are
having these problems, then readd them one index at a time.

On 2 April 2015 at 23:42, Darius <dari...@gmail.com <javascript:>> wrote:

Hello,

Quick description of the problem: I had 1 elasticsearch server cluster
with 0 replicas for all indexes, then added a second ES server to the
cluster and then set that all existing indexes should have 1 replica
(instead of the 0) .
After doing this, in the _cat/shards page I could see that it created
twice as many indexes/indexes(?) (sorry, im not that good with the ES
terminology) and all the new ones were UNASSIGNED. They then started
slowly INITIALIZING (2 at a time) and then go to STARTED state on the
second server. So thats good, but the problem is that this was done like 2
weeks ago and some of them are still UNASSIGNED or INITIALIZING. I've even
noticed some of them going into the INITIALIZING state and then falling
back to UNASSIGNED. How can I properly find the problem here and is there
any way I can force the replicas to be allocated to the 2nd node WITHOUT
any data loss? I found only one way to allocate, but that includes data
loss, unfortunately.

I don't have tons of data in there so I dont see why it should take so
long. The largest index is ±500M in size, most of them are a few megs and
the total number is around ±150-200.

any help would be appreciated..

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/05cad3e5-78ce-44f7-9a30-6ca0eb817237%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/05cad3e5-78ce-44f7-9a30-6ca0eb817237%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/fb5aeaaa-f3ea-4d3c-af96-9b7eb0676ab5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I'd like to thank everyone for their help & suggestions.
So the problem was a ES version missmatch. I had a cluster of two ES
servers and one of them was running ES 1.4 and the other - 1.3. This is
what prevented the index replicas from assigning to the second node. I've
since updated the ES versions to 1.4 on both and the issue is mostly
resolved.

I have a new problem now! One index replica (out of 300) remained
UNASSIGNED. I have also an error which is most likely shows the reason:

WARN ][cluster.action.shard ] [ES node 2] [domain.com][0] received
shard failed for [domain.com][0], node[vSC8rzYwTWiGVRxa_0lkpg], [R],
s[INITIALIZING], indexUUID [WEnzmEWmSMC80FaED_wXVQ],* reason [engine
failure, message [corrupted preexisting
index][CorruptIndexException[[domain.com][0] Preexisting corrupted index
[corrupted_Ue2cAcsjSvmlFkFanPJp6Q] caused by: CorruptIndexException[codec
footer mismatch: actual footer=92416 vs expected footer=-1071082520 *(resource:
NIOFSIndexInput(path="/stats/nodes/0/indices/domain.com/0/index/_og4c.fdt"))]

Has anyone, by any chance, had any experience with errors like such and has
a possible solution that would not include data loss? :slight_smile:

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0885d44d-cf93-4f3f-931f-c0f4023d9b2e%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

I have had experience with such but not without data loss. The reality
is that some data loss has already occurred. I am not aware of any ES
solution that will allow you to retrieve what data remains, without further
data loss, and restore the index to green status. I have seen reference to
some who in desperate situations have resorted to writing Lucene code in
order extract the data from the lucene level in order build a new index. I
also suspect if the index is searchable, that you can perform a scan and
scroll to retrieve what records remain, and build a new index.

It would seem that you need to take a backup of the index as it is and
proceed soon to rebuild that index, as in this degraded state the risk of
additional data loss is real.

I truly hope that Elasticsearch continues in the direction of making their
tools Enterprise ready, including building some data recovery tools for
this very situation. When it happened to me, I got lucky and no longer
needed the index that was corrupted. I also have a source of data that
will allow me to rebuild my indexes at any point with an Amazon EMR job.

On Tue, Apr 7, 2015 at 2:46 AM, Darius dariusr88@gmail.com wrote:

I'd like to thank everyone for their help & suggestions.
So the problem was a ES version missmatch. I had a cluster of two ES
servers and one of them was running ES 1.4 and the other - 1.3. This is
what prevented the index replicas from assigning to the second node. I've
since updated the ES versions to 1.4 on both and the issue is mostly
resolved.

I have a new problem now! One index replica (out of 300) remained
UNASSIGNED. I have also an error which is most likely shows the reason:

WARN ][cluster.action.shard ] [ES node 2] [domain.com][0] received
shard failed for [domain.com][0], node[vSC8rzYwTWiGVRxa_0lkpg], [R],
s[INITIALIZING], indexUUID [WEnzmEWmSMC80FaED_wXVQ],* reason [engine
failure, message [corrupted preexisting
index][CorruptIndexException[[domain.com http://domain.com][0]
Preexisting corrupted index [corrupted_Ue2cAcsjSvmlFkFanPJp6Q] caused by:
CorruptIndexException[codec footer mismatch: actual footer=92416 vs
expected footer=-1071082520 *(resource:
NIOFSIndexInput(path="/stats/nodes/0/indices/domain.com/0/index/_og4c.fdt
"))]

Has anyone, by any chance, had any experience with errors like such and
has a possible solution that would not include data loss? :slight_smile:

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/pPYr9gW9VMI/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/0885d44d-cf93-4f3f-931f-c0f4023d9b2e%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/0885d44d-cf93-4f3f-931f-c0f4023d9b2e%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CADqT7cF0UcAqfrKhXjObJAjB55mFp26HmB1HWL5mAkQdOg5AkA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.