I had a problem with corrupted shards so I restarted my cluster with
"index.shard.check_on_startup: fix" and the corrupted shards were fixed
(i.e. deleted). Unfortunately the replicas and primaries then had differing
numbers of documents despite them all being green. Fortunately the
primaries always had more than the replicas so that I hopefully haven't
lost anything.
To fix this I set the number of replicas to 0 then 1 on all the indices
that had mismatches. Is there a better technique? I really didn't like
having just one copy of my data even if it was for a short time.
I am still running 1.1.1, is this addressed by a later release?
The fix option of check_on_startup checks indices and removes the segments that are corrupted, this is a lucene level operation and is
primarily meant to be used in extreme cases where you only had one copy of
shards and those got corrupted.
In your cases, since the primaries are good, the easiest would be to use
the reroute API to tell elasticsearch to move the replicas that have been
corrupted to another node. When moving replicas, ES actually makes a new
copy of the primary as it protects against exactly these kinds of
situations:
Cheers,
Boaz
On Tuesday, June 10, 2014 9:23:56 AM UTC+2, Michael Salmon wrote:
I had a problem with corrupted shards so I restarted my cluster with
"index.shard.check_on_startup: fix" and the corrupted shards were fixed
(i.e. deleted). Unfortunately the replicas and primaries then had differing
numbers of documents despite them all being green. Fortunately the
primaries always had more than the replicas so that I hopefully haven't
lost anything.
To fix this I set the number of replicas to 0 then 1 on all the indices
that had mismatches. Is there a better technique? I really didn't like
having just one copy of my data even if it was for a short time.
I am still running 1.1.1, is this addressed by a later release?
Thanks, I didn't think of moving the shards. Should have been faster as
well.
On Wednesday, 11 June 2014 09:03:44 UTC+2, Boaz Leskes wrote:
Hi Michael,
The fix option of check_on_startup checks indices and removes the segments that are corrupted, this is a lucene level operation and is
primarily meant to be used in extreme cases where you only had one copy of
shards and those got corrupted.
In your cases, since the primaries are good, the easiest would be to use
the reroute API to tell elasticsearch to move the replicas that have been
corrupted to another node. When moving replicas, ES actually makes a new
copy of the primary as it protects against exactly these kinds of
situations: Elasticsearch Platform — Find real-time answers at scale | Elastic
Cheers,
Boaz
On Tuesday, June 10, 2014 9:23:56 AM UTC+2, Michael Salmon wrote:
I had a problem with corrupted shards so I restarted my cluster with
"index.shard.check_on_startup: fix" and the corrupted shards were fixed
(i.e. deleted). Unfortunately the replicas and primaries then had differing
numbers of documents despite them all being green. Fortunately the
primaries always had more than the replicas so that I hopefully haven't
lost anything.
To fix this I set the number of replicas to 0 then 1 on all the indices
that had mismatches. Is there a better technique? I really didn't like
having just one copy of my data even if it was for a short time.
I am still running 1.1.1, is this addressed by a later release?
Moving the shard was a good idea but unfortunately:
{
"error": "ElasticsearchIllegalArgumentException[[move_allocation] can't
move [ds_infrastructure-storage-na-qtree][0], shard is not started (state =
INITIALIZING]]",
"status": 400
}
Allocate didn't work either as the shard was not unallocated.
On Wednesday, 11 June 2014 09:03:44 UTC+2, Boaz Leskes wrote:
Hi Michael,
The fix option of check_on_startup checks indices and removes the segments that are corrupted, this is a lucene level operation and is
primarily meant to be used in extreme cases where you only had one copy of
shards and those got corrupted.
In your cases, since the primaries are good, the easiest would be to use
the reroute API to tell elasticsearch to move the replicas that have been
corrupted to another node. When moving replicas, ES actually makes a new
copy of the primary as it protects against exactly these kinds of
situations: Elasticsearch Platform — Find real-time answers at scale | Elastic
Cheers,
Boaz
On Tuesday, June 10, 2014 9:23:56 AM UTC+2, Michael Salmon wrote:
I had a problem with corrupted shards so I restarted my cluster with
"index.shard.check_on_startup: fix" and the corrupted shards were fixed
(i.e. deleted). Unfortunately the replicas and primaries then had differing
numbers of documents despite them all being green. Fortunately the
primaries always had more than the replicas so that I hopefully haven't
lost anything.
To fix this I set the number of replicas to 0 then 1 on all the indices
that had mismatches. Is there a better technique? I really didn't like
having just one copy of my data even if it was for a short time.
I am still running 1.1.1, is this addressed by a later release?
hmm. Yeah, I can now see that in the code. Another option is to use the
allocation filtering api to move the shard off the node and then cancel the
rule after it was done:
Moving the shard was a good idea but unfortunately:
{
"error": "ElasticsearchIllegalArgumentException[[move_allocation] can't
move [ds_infrastructure-storage-na-qtree][0], shard is not started (state =
INITIALIZING]]",
"status": 400
}
Allocate didn't work either as the shard was not unallocated.
On Wednesday, 11 June 2014 09:03:44 UTC+2, Boaz Leskes wrote:
Hi Michael,
The fix option of check_on_startup checks indices and removes the segments that are corrupted, this is a lucene level operation and is
primarily meant to be used in extreme cases where you only had one copy of
shards and those got corrupted.
In your cases, since the primaries are good, the easiest would be to use
the reroute API to tell elasticsearch to move the replicas that have been
corrupted to another node. When moving replicas, ES actually makes a new
copy of the primary as it protects against exactly these kinds of
situations: Elasticsearch Platform — Find real-time answers at scale | Elastic
reference/current/cluster-reroute.html#cluster-reroute
Cheers,
Boaz
On Tuesday, June 10, 2014 9:23:56 AM UTC+2, Michael Salmon wrote:
I had a problem with corrupted shards so I restarted my cluster with
"index.shard.check_on_startup: fix" and the corrupted shards were fixed
(i.e. deleted). Unfortunately the replicas and primaries then had differing
numbers of documents despite them all being green. Fortunately the
primaries always had more than the replicas so that I hopefully haven't
lost anything.
To fix this I set the number of replicas to 0 then 1 on all the indices
that had mismatches. Is there a better technique? I really didn't like
having just one copy of my data even if it was for a short time.
I am still running 1.1.1, is this addressed by a later release?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.