Feature Request Reuse Work directory


(Yatir Ben Shlomo-3) #1

Hi,
If I am using a large index (10's of GBs)
and a cloud node is restarting then it takes a long time for the node
to read all the index from the gateway.

In cloud environments this is understandable but if this is node is
running as part of my LAN
I would be happy if I could have configured the machine to reuse its
files in the work directory (if they exist) and in the background to
whatever it takes to synchronize with the data on the gateway
but until the data is synchronized lets have the node use its local
files.


(Shay Banon) #2

This can't be done due to consistency issues in distributed systems.

On Tue, Jun 1, 2010 at 5:21 PM, Yatir Ben Shlomo yatirb@gmail.com wrote:

Hi,
If I am using a large index (10's of GBs)
and a cloud node is restarting then it takes a long time for the node
to read all the index from the gateway.

In cloud environments this is understandable but if this is node is
running as part of my LAN
I would be happy if I could have configured the machine to reuse its
files in the work directory (if they exist) and in the background to
whatever it takes to synchronize with the data on the gateway
but until the data is synchronized lets have the node use its local
files.


(Shay Banon) #3

Also, a shard will read its data from the gateway only when its the first
time it gets instantiated. Otherwise, it will recovery its state from a
replica (which still takes its toll, but manageable).

By the way, do you have the same problem with cassandra / hadoop doing the
same? Cause they do :slight_smile:

On Tue, Jun 1, 2010 at 11:16 PM, Shay Banon shay.banon@elasticsearch.comwrote:

This can't be done due to consistency issues in distributed systems.

On Tue, Jun 1, 2010 at 5:21 PM, Yatir Ben Shlomo yatirb@gmail.com wrote:

Hi,
If I am using a large index (10's of GBs)
and a cloud node is restarting then it takes a long time for the node
to read all the index from the gateway.

In cloud environments this is understandable but if this is node is
running as part of my LAN
I would be happy if I could have configured the machine to reuse its
files in the work directory (if they exist) and in the background to
whatever it takes to synchronize with the data on the gateway
but until the data is synchronized lets have the node use its local
files.


(Ori Lahav) #4

Shay,
Cassandra (and I think Hadoop also) is recovering from local storage for
sure and then get only the missing data from other nodes.
Solr replicas (as well as any other system based on replication)
is also recovering from local disk and then rolling changes till they get to
sync with master.

I don't see why ES shard can't recover from local disk with i's own log
position and then roll the transactions log for changes from the master.

I understand that ES was planed to run on cloud solutions where you more
frequently switch servers, but even on cloud if you just want to upgrade
code or take the service down for a minute, there is no need to move gigs of
data on the wire it just doesn't make sense.

On Tue, Jun 1, 2010 at 11:25 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Also, a shard will read its data from the gateway only when its the first
time it gets instantiated. Otherwise, it will recovery its state from a
replica (which still takes its toll, but manageable).

By the way, do you have the same problem with cassandra / hadoop doing the
same? Cause they do :slight_smile:

On Tue, Jun 1, 2010 at 11:16 PM, Shay Banon shay.banon@elasticsearch.comwrote:

This can't be done due to consistency issues in distributed systems.

On Tue, Jun 1, 2010 at 5:21 PM, Yatir Ben Shlomo yatirb@gmail.comwrote:

Hi,
If I am using a large index (10's of GBs)
and a cloud node is restarting then it takes a long time for the node
to read all the index from the gateway.

In cloud environments this is understandable but if this is node is
running as part of my LAN
I would be happy if I could have configured the machine to reuse its
files in the work directory (if they exist) and in the background to
whatever it takes to synchronize with the data on the gateway
but until the data is synchronized lets have the node use its local
files.

--


(Shay Banon) #5

Its very simple, Cassandra delegates that overhead when it does read repair,
so basically, if your node is down long enough, you would end up doing the
same piggybacking the API invocation of the user, and I am putting aside the
fact that you can loose operations since if a node "shard" is not available
to route to, it writes locally and retries later... (unless something
changed, thats what I know). Hadoop does something similar to what
elasticsearch does.

There is no read repair in elasticsearch (can be implements of course, as
anything else in software, but it will kill your performance, search engines
provide an order of magnitude more query capabilities than cassandra, thus
the difficulty).

Since read repair is not possible, and you can't tell in distribute systems
which node went down last, or what was the state of the cluster when it was
brought down (for example, you bring down machine A, and after 10 minutes to
bring down machine B, while the system is actively indexing data. Then you
start machine A, and after 10 minutes start machine B, you can basically
loose most of the data that was indexed on machine B), then elasticsearch
relies on an external storage to provide that master record when it boots.

Having said that, the recovery from the gateway can be (and should be)
heavily optimized, which I have on my "roadmap" to do. There is no reason to
recover "files" that existed when the node was brought down from the gateway
because of the write once nature of files in search engines, they can be
reused. Its a bit tricky, but can be done.

-shay.banon

On Thu, Jun 3, 2010 at 12:48 PM, Ori Lahav olahav@gmail.com wrote:

Shay,
Cassandra (and I think Hadoop also) is recovering from local storage for
sure and then get only the missing data from other nodes.
Solr replicas (as well as any other system based on replication)
is also recovering from local disk and then rolling changes till they get to
sync with master.

I don't see why ES shard can't recover from local disk with i's own log
position and then roll the transactions log for changes from the master.

I understand that ES was planed to run on cloud solutions where you more
frequently switch servers, but even on cloud if you just want to upgrade
code or take the service down for a minute, there is no need to move gigs of
data on the wire it just doesn't make sense.

On Tue, Jun 1, 2010 at 11:25 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Also, a shard will read its data from the gateway only when its the first
time it gets instantiated. Otherwise, it will recovery its state from a
replica (which still takes its toll, but manageable).

By the way, do you have the same problem with cassandra / hadoop doing the
same? Cause they do :slight_smile:

On Tue, Jun 1, 2010 at 11:16 PM, Shay Banon <shay.banon@elasticsearch.com

wrote:

This can't be done due to consistency issues in distributed systems.

On Tue, Jun 1, 2010 at 5:21 PM, Yatir Ben Shlomo yatirb@gmail.comwrote:

Hi,
If I am using a large index (10's of GBs)
and a cloud node is restarting then it takes a long time for the node
to read all the index from the gateway.

In cloud environments this is understandable but if this is node is
running as part of my LAN
I would be happy if I could have configured the machine to reuse its
files in the work directory (if they exist) and in the background to
whatever it takes to synchronize with the data on the gateway
but until the data is synchronized lets have the node use its local
files.

--
http://olahav.typepad.com


(Ori Lahav) #6

I understand that.
I'm not against the notion of the gateway and the "single storage" but I
think a node can recover itself only by rolling a log from the gateway and
not pulling all the 10s of gigs from the gateway(or other node) if it was
down for 1 minute.

If you already have a transaction log on the gateway, is there a barrier for
implementing this? (except of your time:))

On Thu, Jun 3, 2010 at 2:00 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Its very simple, Cassandra delegates that overhead when it does read
repair, so basically, if your node is down long enough, you would end up
doing the same piggybacking the API invocation of the user, and I am putting
aside the fact that you can loose operations since if a node "shard" is not
available to route to, it writes locally and retries later... (unless
something changed, thats what I know). Hadoop does something similar to what
elasticsearch does.

There is no read repair in elasticsearch (can be implements of course, as
anything else in software, but it will kill your performance, search engines
provide an order of magnitude more query capabilities than cassandra, thus
the difficulty).

Since read repair is not possible, and you can't tell in distribute systems
which node went down last, or what was the state of the cluster when it was
brought down (for example, you bring down machine A, and after 10 minutes to
bring down machine B, while the system is actively indexing data. Then you
start machine A, and after 10 minutes start machine B, you can basically
loose most of the data that was indexed on machine B), then elasticsearch
relies on an external storage to provide that master record when it boots.

Having said that, the recovery from the gateway can be (and should be)
heavily optimized, which I have on my "roadmap" to do. There is no reason to
recover "files" that existed when the node was brought down from the gateway
because of the write once nature of files in search engines, they can be
reused. Its a bit tricky, but can be done.

-shay.banon

On Thu, Jun 3, 2010 at 12:48 PM, Ori Lahav olahav@gmail.com wrote:

Shay,
Cassandra (and I think Hadoop also) is recovering from local storage for
sure and then get only the missing data from other nodes.
Solr replicas (as well as any other system based on replication)
is also recovering from local disk and then rolling changes till they get to
sync with master.

I don't see why ES shard can't recover from local disk with i's own log
position and then roll the transactions log for changes from the master.

I understand that ES was planed to run on cloud solutions where you more
frequently switch servers, but even on cloud if you just want to upgrade
code or take the service down for a minute, there is no need to move gigs of
data on the wire it just doesn't make sense.

On Tue, Jun 1, 2010 at 11:25 PM, Shay Banon <shay.banon@elasticsearch.com

wrote:

Also, a shard will read its data from the gateway only when its the first
time it gets instantiated. Otherwise, it will recovery its state from a
replica (which still takes its toll, but manageable).

By the way, do you have the same problem with cassandra / hadoop doing
the same? Cause they do :slight_smile:

On Tue, Jun 1, 2010 at 11:16 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

This can't be done due to consistency issues in distributed systems.

On Tue, Jun 1, 2010 at 5:21 PM, Yatir Ben Shlomo yatirb@gmail.comwrote:

Hi,
If I am using a large index (10's of GBs)
and a cloud node is restarting then it takes a long time for the node
to read all the index from the gateway.

In cloud environments this is understandable but if this is node is
running as part of my LAN
I would be happy if I could have configured the machine to reuse its
files in the work directory (if they exist) and in the background to
whatever it takes to synchronize with the data on the gateway
but until the data is synchronized lets have the node use its local
files.

--
http://olahav.typepad.com

--


(Ori Lahav) #7

ohh, BTW
the way Solr is doing replication for the lucene index is by copying only
the files hat have been changed since last replication.
it is also a good way to implement this. get only the index files that
have changed since the node was brought down.
might be easier and more efficient for node recovery.

what do you say?

On Thu, Jun 3, 2010 at 2:23 PM, Ori Lahav olahav@gmail.com wrote:

I understand that.
I'm not against the notion of the gateway and the "single storage" but I
think a node can recover itself only by rolling a log from the gateway and
not pulling all the 10s of gigs from the gateway(or other node) if it was
down for 1 minute.

If you already have a transaction log on the gateway, is there a barrier
for implementing this? (except of your time:))

On Thu, Jun 3, 2010 at 2:00 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Its very simple, Cassandra delegates that overhead when it does read
repair, so basically, if your node is down long enough, you would end up
doing the same piggybacking the API invocation of the user, and I am putting
aside the fact that you can loose operations since if a node "shard" is not
available to route to, it writes locally and retries later... (unless
something changed, thats what I know). Hadoop does something similar to what
elasticsearch does.

There is no read repair in elasticsearch (can be implements of course, as
anything else in software, but it will kill your performance, search engines
provide an order of magnitude more query capabilities than cassandra, thus
the difficulty).

Since read repair is not possible, and you can't tell in distribute
systems which node went down last, or what was the state of the cluster when
it was brought down (for example, you bring down machine A, and after 10
minutes to bring down machine B, while the system is actively indexing data.
Then you start machine A, and after 10 minutes start machine B, you can
basically loose most of the data that was indexed on machine B), then
elasticsearch relies on an external storage to provide that master record
when it boots.

Having said that, the recovery from the gateway can be (and should be)
heavily optimized, which I have on my "roadmap" to do. There is no reason to
recover "files" that existed when the node was brought down from the gateway
because of the write once nature of files in search engines, they can be
reused. Its a bit tricky, but can be done.

-shay.banon

On Thu, Jun 3, 2010 at 12:48 PM, Ori Lahav olahav@gmail.com wrote:

Shay,
Cassandra (and I think Hadoop also) is recovering from local storage for
sure and then get only the missing data from other nodes.
Solr replicas (as well as any other system based on replication)
is also recovering from local disk and then rolling changes till they get to
sync with master.

I don't see why ES shard can't recover from local disk with i's own log
position and then roll the transactions log for changes from the master.

I understand that ES was planed to run on cloud solutions where you more
frequently switch servers, but even on cloud if you just want to upgrade
code or take the service down for a minute, there is no need to move gigs of
data on the wire it just doesn't make sense.

On Tue, Jun 1, 2010 at 11:25 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

Also, a shard will read its data from the gateway only when its the
first time it gets instantiated. Otherwise, it will recovery its state from
a replica (which still takes its toll, but manageable).

By the way, do you have the same problem with cassandra / hadoop doing
the same? Cause they do :slight_smile:

On Tue, Jun 1, 2010 at 11:16 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

This can't be done due to consistency issues in distributed systems.

On Tue, Jun 1, 2010 at 5:21 PM, Yatir Ben Shlomo yatirb@gmail.comwrote:

Hi,
If I am using a large index (10's of GBs)
and a cloud node is restarting then it takes a long time for the node
to read all the index from the gateway.

In cloud environments this is understandable but if this is node is
running as part of my LAN
I would be happy if I could have configured the machine to reuse its
files in the work directory (if they exist) and in the background to
whatever it takes to synchronize with the data on the gateway
but until the data is synchronized lets have the node use its local
files.

--
http://olahav.typepad.com

--
http://olahav.typepad.com

--


(Shay Banon) #8

Thats similar to what elasticsearch does in an on going basis (when writing
to the gateway). But, you run into the same problem I explained in my
previous email regarding loosing data (ontop of the fact that until you
commit, you might loose data in Solr on server crash / shutdown).

So, what I explained the last bit is basically that, copy over from the
gateway only the files that have changed (and note, this only happens when
the first shard within its replication group in the lifecycle of the cluster
is created, there is an effort to minimize that chatter with the gateway).
Thats a bit tricky, since you can start several nodes on the same machine
and so on, so you need to know where to recovery from, but, doable. The
transaction log is important, but it does not solve the full picture.

I already opened an issue for this:
http://github.com/elasticsearch/elasticsearch/issues/issue/206.

-shay.banon

On Thu, Jun 3, 2010 at 2:35 PM, Ori Lahav olahav@gmail.com wrote:

ohh, BTW
the way Solr is doing replication for the lucene index is by copying only
the files hat have been changed since last replication.
it is also a good way to implement this. get only the index files that
have changed since the node was brought down.
might be easier and more efficient for node recovery.

what do you say?

On Thu, Jun 3, 2010 at 2:23 PM, Ori Lahav olahav@gmail.com wrote:

I understand that.
I'm not against the notion of the gateway and the "single storage" but I
think a node can recover itself only by rolling a log from the gateway and
not pulling all the 10s of gigs from the gateway(or other node) if it was
down for 1 minute.

If you already have a transaction log on the gateway, is there a barrier
for implementing this? (except of your time:))

On Thu, Jun 3, 2010 at 2:00 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Its very simple, Cassandra delegates that overhead when it does read
repair, so basically, if your node is down long enough, you would end up
doing the same piggybacking the API invocation of the user, and I am putting
aside the fact that you can loose operations since if a node "shard" is not
available to route to, it writes locally and retries later... (unless
something changed, thats what I know). Hadoop does something similar to what
elasticsearch does.

There is no read repair in elasticsearch (can be implements of course, as
anything else in software, but it will kill your performance, search engines
provide an order of magnitude more query capabilities than cassandra, thus
the difficulty).

Since read repair is not possible, and you can't tell in distribute
systems which node went down last, or what was the state of the cluster when
it was brought down (for example, you bring down machine A, and after 10
minutes to bring down machine B, while the system is actively indexing data.
Then you start machine A, and after 10 minutes start machine B, you can
basically loose most of the data that was indexed on machine B), then
elasticsearch relies on an external storage to provide that master record
when it boots.

Having said that, the recovery from the gateway can be (and should be)
heavily optimized, which I have on my "roadmap" to do. There is no reason to
recover "files" that existed when the node was brought down from the gateway
because of the write once nature of files in search engines, they can be
reused. Its a bit tricky, but can be done.

-shay.banon

On Thu, Jun 3, 2010 at 12:48 PM, Ori Lahav olahav@gmail.com wrote:

Shay,
Cassandra (and I think Hadoop also) is recovering from local storage for
sure and then get only the missing data from other nodes.
Solr replicas (as well as any other system based on replication)
is also recovering from local disk and then rolling changes till they get to
sync with master.

I don't see why ES shard can't recover from local disk with i's own log
position and then roll the transactions log for changes from the master.

I understand that ES was planed to run on cloud solutions where you more
frequently switch servers, but even on cloud if you just want to upgrade
code or take the service down for a minute, there is no need to move gigs of
data on the wire it just doesn't make sense.

On Tue, Jun 1, 2010 at 11:25 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

Also, a shard will read its data from the gateway only when its the
first time it gets instantiated. Otherwise, it will recovery its state from
a replica (which still takes its toll, but manageable).

By the way, do you have the same problem with cassandra / hadoop doing
the same? Cause they do :slight_smile:

On Tue, Jun 1, 2010 at 11:16 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

This can't be done due to consistency issues in distributed systems.

On Tue, Jun 1, 2010 at 5:21 PM, Yatir Ben Shlomo yatirb@gmail.comwrote:

Hi,
If I am using a large index (10's of GBs)
and a cloud node is restarting then it takes a long time for the node
to read all the index from the gateway.

In cloud environments this is understandable but if this is node is
running as part of my LAN
I would be happy if I could have configured the machine to reuse its
files in the work directory (if they exist) and in the background to
whatever it takes to synchronize with the data on the gateway
but until the data is synchronized lets have the node use its local
files.

--
http://olahav.typepad.com

--
http://olahav.typepad.com

--
http://olahav.typepad.com


(Ori Lahav) #9

Thanks Shay
IMHO - it's a well needed one.

On Thu, Jun 3, 2010 at 3:19 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Thats similar to what elasticsearch does in an on going basis (when writing
to the gateway). But, you run into the same problem I explained in my
previous email regarding loosing data (ontop of the fact that until you
commit, you might loose data in Solr on server crash / shutdown).

So, what I explained the last bit is basically that, copy over from the
gateway only the files that have changed (and note, this only happens when
the first shard within its replication group in the lifecycle of the cluster
is created, there is an effort to minimize that chatter with the gateway).
Thats a bit tricky, since you can start several nodes on the same machine
and so on, so you need to know where to recovery from, but, doable. The
transaction log is important, but it does not solve the full picture.

I already opened an issue for this:
http://github.com/elasticsearch/elasticsearch/issues/issue/206.

-shay.banon

On Thu, Jun 3, 2010 at 2:35 PM, Ori Lahav olahav@gmail.com wrote:

ohh, BTW
the way Solr is doing replication for the lucene index is by copying only
the files hat have been changed since last replication.
it is also a good way to implement this. get only the index files that
have changed since the node was brought down.
might be easier and more efficient for node recovery.

what do you say?

On Thu, Jun 3, 2010 at 2:23 PM, Ori Lahav olahav@gmail.com wrote:

I understand that.
I'm not against the notion of the gateway and the "single storage" but I
think a node can recover itself only by rolling a log from the gateway and
not pulling all the 10s of gigs from the gateway(or other node) if it was
down for 1 minute.

If you already have a transaction log on the gateway, is there a barrier
for implementing this? (except of your time:))

On Thu, Jun 3, 2010 at 2:00 PM, Shay Banon <shay.banon@elasticsearch.com

wrote:

Its very simple, Cassandra delegates that overhead when it does read
repair, so basically, if your node is down long enough, you would end up
doing the same piggybacking the API invocation of the user, and I am putting
aside the fact that you can loose operations since if a node "shard" is not
available to route to, it writes locally and retries later... (unless
something changed, thats what I know). Hadoop does something similar to what
elasticsearch does.

There is no read repair in elasticsearch (can be implements of course,
as anything else in software, but it will kill your performance, search
engines provide an order of magnitude more query capabilities than
cassandra, thus the difficulty).

Since read repair is not possible, and you can't tell in distribute
systems which node went down last, or what was the state of the cluster when
it was brought down (for example, you bring down machine A, and after 10
minutes to bring down machine B, while the system is actively indexing data.
Then you start machine A, and after 10 minutes start machine B, you can
basically loose most of the data that was indexed on machine B), then
elasticsearch relies on an external storage to provide that master record
when it boots.

Having said that, the recovery from the gateway can be (and should be)
heavily optimized, which I have on my "roadmap" to do. There is no reason to
recover "files" that existed when the node was brought down from the gateway
because of the write once nature of files in search engines, they can be
reused. Its a bit tricky, but can be done.

-shay.banon

On Thu, Jun 3, 2010 at 12:48 PM, Ori Lahav olahav@gmail.com wrote:

Shay,
Cassandra (and I think Hadoop also) is recovering from local storage
for sure and then get only the missing data from other nodes.
Solr replicas (as well as any other system based on replication)
is also recovering from local disk and then rolling changes till they get to
sync with master.

I don't see why ES shard can't recover from local disk with i's own log
position and then roll the transactions log for changes from the master.

I understand that ES was planed to run on cloud solutions where you
more frequently switch servers, but even on cloud if you just want to
upgrade code or take the service down for a minute, there is no need to move
gigs of data on the wire it just doesn't make sense.

On Tue, Jun 1, 2010 at 11:25 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

Also, a shard will read its data from the gateway only when its the
first time it gets instantiated. Otherwise, it will recovery its state from
a replica (which still takes its toll, but manageable).

By the way, do you have the same problem with cassandra / hadoop doing
the same? Cause they do :slight_smile:

On Tue, Jun 1, 2010 at 11:16 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

This can't be done due to consistency issues in distributed systems.

On Tue, Jun 1, 2010 at 5:21 PM, Yatir Ben Shlomo yatirb@gmail.comwrote:

Hi,
If I am using a large index (10's of GBs)
and a cloud node is restarting then it takes a long time for the
node
to read all the index from the gateway.

In cloud environments this is understandable but if this is node is
running as part of my LAN
I would be happy if I could have configured the machine to reuse its
files in the work directory (if they exist) and in the background to
whatever it takes to synchronize with the data on the gateway
but until the data is synchronized lets have the node use its local
files.

--
http://olahav.typepad.com

--
http://olahav.typepad.com

--
http://olahav.typepad.com

--


(Shay Banon) #10

This has been implemented in 0.9.

-shay.banon

On Jun 3, 3:34 pm, Ori Lahav olahav@gmail.com wrote:

Thanks Shay
IMHO - it's a well needed one.

On Thu, Jun 3, 2010 at 3:19 PM, Shay Banon shay.banon@elasticsearch.comwrote:

Thats similar to what elasticsearch does in an on going basis (when writing
to the gateway). But, you run into the same problem I explained in my
previous email regarding loosing data (ontop of the fact that until you
commit, you might loose data in Solr on server crash / shutdown).

So, what I explained the last bit is basically that, copy over from the
gateway only the files that have changed (and note, this only happens when
the first shard within its replication group in the lifecycle of the cluster
is created, there is an effort to minimize that chatter with the gateway).
Thats a bit tricky, since you can start several nodes on the same machine
and so on, so you need to know where to recovery from, but, doable. The
transaction log is important, but it does not solve the full picture.

I already opened an issue for this:
http://github.com/elasticsearch/elasticsearch/issues/issue/206.

-shay.banon

On Thu, Jun 3, 2010 at 2:35 PM, Ori Lahav olahav@gmail.com wrote:

ohh, BTW
the way Solr is doing replication for the lucene index is by copying only
the files hat have been changed since last replication.
it is also a good way to implement this. get only the index files that
have changed since the node was brought down.
might be easier and more efficient for node recovery.

what do you say?

On Thu, Jun 3, 2010 at 2:23 PM, Ori Lahav olahav@gmail.com wrote:

I understand that.
I'm not against the notion of the gateway and the "single storage" but I
think a node can recover itself only by rolling a log from the gateway and
not pulling all the 10s of gigs from the gateway(or other node) if it was
down for 1 minute.

If you already have a transaction log on the gateway, is there a barrier
for implementing this? (except of your time:))

On Thu, Jun 3, 2010 at 2:00 PM, Shay Banon <shay.banon@elasticsearch.com

wrote:

Its very simple, Cassandra delegates that overhead when it does read
repair, so basically, if your node is down long enough, you would end up
doing the same piggybacking the API invocation of the user, and I am putting
aside the fact that you can loose operations since if a node "shard" is not
available to route to, it writes locally and retries later... (unless
something changed, thats what I know). Hadoop does something similar to what
elasticsearch does.

There is no read repair in elasticsearch (can be implements of course,
as anything else in software, but it will kill your performance, search
engines provide an order of magnitude more query capabilities than
cassandra, thus the difficulty).

Since read repair is not possible, and you can't tell in distribute
systems which node went down last, or what was the state of the cluster when
it was brought down (for example, you bring down machine A, and after 10
minutes to bring down machine B, while the system is actively indexing data.
Then you start machine A, and after 10 minutes start machine B, you can
basically loose most of the data that was indexed on machine B), then
elasticsearch relies on an external storage to provide that master record
when it boots.

Having said that, the recovery from the gateway can be (and should be)
heavily optimized, which I have on my "roadmap" to do. There is no reason to
recover "files" that existed when the node was brought down from the gateway
because of the write once nature of files in search engines, they can be
reused. Its a bit tricky, but can be done.

-shay.banon

On Thu, Jun 3, 2010 at 12:48 PM, Ori Lahav olahav@gmail.com wrote:

Shay,
Cassandra (and I think Hadoop also) is recovering from local storage
for sure and then get only the missing data from other nodes.
Solr replicas (as well as any other system based on replication)
is also recovering from local disk and then rolling changes till they get to
sync with master.

I don't see why ES shard can't recover from local disk with i's own log
position and then roll the transactions log for changes from the master.

I understand that ES was planed to run on cloud solutions where you
more frequently switch servers, but even on cloud if you just want to
upgrade code or take the service down for a minute, there is no need to move
gigs of data on the wire it just doesn't make sense.

On Tue, Jun 1, 2010 at 11:25 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

Also, a shard will read its data from the gateway only when its the
first time it gets instantiated. Otherwise, it will recovery its state from
a replica (which still takes its toll, but manageable).

By the way, do you have the same problem with cassandra / hadoop doing
the same? Cause they do :slight_smile:

On Tue, Jun 1, 2010 at 11:16 PM, Shay Banon <
shay.banon@elasticsearch.com> wrote:

This can't be done due to consistency issues in distributed systems.

On Tue, Jun 1, 2010 at 5:21 PM, Yatir Ben Shlomo yatirb@gmail.comwrote:

Hi,
If I am using a large index (10's of GBs)
and a cloud node is restarting then it takes a long time for the
node
to read all the index from the gateway.

In cloud environments this is understandable but if this is node is
running as part of my LAN
I would be happy if I could have configured the machine to reuse its
files in the work directory (if they exist) and in the background to
whatever it takes to synchronize with the data on the gateway
but until the data is synchronized lets have the node use its local
files.

--
http://olahav.typepad.com

--
http://olahav.typepad.com

--
http://olahav.typepad.com

--http://olahav.typepad.com


(system) #11