Feature Request Reuse Work directory

Shay,
Cassandra (and I think Hadoop also) is recovering from local storage for
sure and then get only the missing data from other nodes.
Solr replicas (as well as any other system based on replication)
is also recovering from local disk and then rolling changes till they get to
sync with master.

I don't see why ES shard can't recover from local disk with i's own log
position and then roll the transactions log for changes from the master.

I understand that ES was planed to run on cloud solutions where you more
frequently switch servers, but even on cloud if you just want to upgrade
code or take the service down for a minute, there is no need to move gigs of
data on the wire it just doesn't make sense.

On Tue, Jun 1, 2010 at 11:25 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Also, a shard will read its data from the gateway only when its the first
time it gets instantiated. Otherwise, it will recovery its state from a
replica (which still takes its toll, but manageable).

By the way, do you have the same problem with cassandra / hadoop doing the
same? Cause they do :slight_smile:

On Tue, Jun 1, 2010 at 11:16 PM, Shay Banon shay.banon@elasticsearch.comwrote:

This can't be done due to consistency issues in distributed systems.

On Tue, Jun 1, 2010 at 5:21 PM, Yatir Ben Shlomo yatirb@gmail.comwrote:

Hi,
If I am using a large index (10's of GBs)
and a cloud node is restarting then it takes a long time for the node
to read all the index from the gateway.

In cloud environments this is understandable but if this is node is
running as part of my LAN
I would be happy if I could have configured the machine to reuse its
files in the work directory (if they exist) and in the background to
whatever it takes to synchronize with the data on the gateway
but until the data is synchronized lets have the node use its local
files.

--
http://olahav.typepad.com