Previous posts have suggested that it's possible to backup/migrate/
restore indexes from one cluster to another via manual copy. I tested
this yesterday and ran into some problems.
Setup:
My test/reference cluster ("infinite-aws") consists of 3 nodes (on
AWS, default gateway settings). It has 8 indices: 3 of which are 1
shard with 2 replicas, and 5 of which are 5 (3/5) or 10 (2/5) shards
each with 1 replica.
I wanted to test copying this set of indices into a different cluster
("infinite-dev") consisting of just 1 node.
I am running 0.16.2
Steps:
I created a tar of the "/data/infinite-aws" directory (with
"disable translog flush" set to true) on one of the 3 "infinite-aws"
nodes, and scp'd it across to "infinite-dev".
I then ran the following script (ES stopped):
tar xvf index_backup_most_recent.tar
rm -rf data/infinite-dev
mv data/infinite-aws data/infinite-dev
And restarted ES. At this point I obviously expect the status to be
red since I have too many replicas, so I run:
curl -XPUT 'http://localhost:9200/_all/_settings' -d '{ "index":
{ "number_of_replicas": 0 } }'
(I also re-enabled the translog flushing and deleted the node.lock,
though I assume that they are reset by restarting ES anyway)
This gets all but 2 indices working (1x 5 shard and 1x 10 shard). The
5-shard index has 2 shards remain unassigned, and the 10-shard index
has all shards remain unassigned. The status obviously remains red.
Looking at the status for those 2 indices (eg from the overview page
of ES-head), I note that the unassigned shards are not listed (eg the
status for the 10-shard index just reports "null"). In the overall
cluster health, the missing shards are listed as unassigned, with all
relocating fields set to null.
Looking at the distribution of the failing indices in the original
cluster, I note that 1 of the 3 nodes (in fact the master) has 1
replica of every shard of every index apart from the 12 failed
shards across the 2 indexes (which are distributed across the other 2
nodes - as an aside, is that slightly dubious balancing expected?). So
this would appear to be the root of the problem.
I tried opening and closing the indices and restarting the node and
just waiting a long time, all to no avail.
So:
- Is what I'm trying to do (I guess specifically the bit about
copying by hand from a 3-node cluster to a 1-node cluster) supported? - If so, are any migration steps missing/wrong?
- If this flavor of manual migration is not supported between
clusters of different sizes, is there an alternative?
*** Some more details:
The INFO log reports nothing interesting, eg:
2011-06-14 14:57:18.104 [INFO] gateway:79 - [Spiral] recovered [8]
indices into cluster_state
2011-06-14 15:04:15.552 [INFO] cluster.metadata:79 - [Spiral] Updating
number_of_replicas to [0] for indices [doc_4dd53fb4e40d93afb096c484,
event_index, doc_4c927585d591d31d7c37097b, document_index, doc_dummy,
gazetteer_index, doc_4db5c05fb246d25364aceca0,
doc_4c927585d591d31d7b37097a]
(doc_4db5c05fb246d25364aceca0, doc_4c927585d591d31d7b37097a are the 2
failing indices)
In DEBUG you get in addition reports like:
2011-06-15 08:35:19.782 [DEBUG] gateway.local:71 - [Guido Carosella]
[doc_4db5c05fb246d25364aceca0][4]: not allocating,
number_of_allocated_shards_found [0], required_number [1]
2011-06-15 08:35:19.782 [DEBUG] gateway.local:71 - [Guido Carosella]
[doc_4c927585d591d31d7b37097a][0]: not allocating,
number_of_allocated_shards_found [0], required_number [1]
I can provide any other details that night be helpful.
Thanks as always, for anyone/everyone's insight,
Alex
(I'm always hoping to see a question I can answer, so I can help out,
but someone - usually Shay - always beats me to it!)