Yea, sure.
Is it possible to copy a gateway from S3 to local, run the conversion
locally, then copy it back to S3? The S3 rename issue could be solved
this way, with less total downtime.
...Ken
Shay Banon wrote:
You can check what I do in the script, its pretty simple. The problem is
that S3 does not have a rename method..., though maybe a copy command
can be used, but not sure how long it takes for large files. You can
replace the file based operations with S3 ones.
-shay.banon
On Thu, Aug 26, 2010 at 2:22 AM, Kenneth Loafman
<kenneth.loafman@gmail.com mailto:kenneth.loafman@gmail.com> wrote:
Any hints for how to upgrade an S3 gateway?
...Ken
Shay Banon wrote:
> Yes, here it is: http://gist.github.com/546494.
>
> -shay.banon
>
> On Wed, Aug 25, 2010 at 8:46 PM, Paul <ppearcy@gmail.com
<mailto:ppearcy@gmail.com>
> <mailto:ppearcy@gmail.com <mailto:ppearcy@gmail.com>>> wrote:
>
> Hi Shay,
> Just curious, is the conversion script for file based gateway
is
> available? I haven't searched through GIT, so maybe it is
there?
>
> Spent the weekend building up 20mil docs and would prefer not
to
> repeat this.
>
> Thanks,
> Paul
>
> On Aug 24, 4:34 pm, Shay Banon <shay.ba...@elasticsearch.com
<mailto:shay.ba...@elasticsearch.com>
> <mailto:shay.ba...@elasticsearch.com
<mailto:shay.ba...@elasticsearch.com>>> wrote:
> > Indeed. All the __xxx fils are binary files (either index
files or
> > transaction log parts). The commit-N is a json file that
provides meta
> > information on the commit point.
> >
> >
> >
> > On Wed, Aug 25, 2010 at 1:32 AM, Grant Rodgers
<gra...@gmail.com <mailto:gra...@gmail.com>
> <mailto:gra...@gmail.com <mailto:gra...@gmail.com>>> wrote:
> > > I was very curious to see what the file format was!
> >
> > > On Aug 24, 3:05 pm, Shay Banon
<shay.ba...@elasticsearch.com <mailto:shay.ba...@elasticsearch.com>
> <mailto:shay.ba...@elasticsearch.com
<mailto:shay.ba...@elasticsearch.com>>> wrote:
> > > > Great, thanks for validating that. Did not expect someone
to
> open vi on
> > > the
> > > > gateway ;)
> >
> > > > On Wed, Aug 25, 2010 at 12:18 AM, Grant Rodgers
> <gra...@gmail.com <mailto:gra...@gmail.com>
<mailto:gra...@gmail.com <mailto:gra...@gmail.com>>>
> > > wrote:
> > > > > I think it's fixed in head; I didn't see this error
again
> when trying
> > > > > the test below.
> >
> > > > > On Aug 24, 1:12 pm, Grant Rodgers <gra...@gmail.com
<mailto:gra...@gmail.com>
> <mailto:gra...@gmail.com <mailto:gra...@gmail.com>>> wrote:
> > > > > > Oh you know it was probably a vi swap file. I was
taking a
> look at
> > > one
> > > > > > of the commit logs, and it might have tried to
snapshot
> while it was
> > > > > > open.
> >
> > > > > > I think you have committed another change since
80c7135
> that ignores
> > > > > > files elasticsearch didn't create. I'll build the
latest
> head and try
> > > > > > viewing a commit snapshot again.
> >
> > > > > > On Aug 24, 1:00 pm, Shay Banon
> <shay.ba...@elasticsearch.com
<mailto:shay.ba...@elasticsearch.com>
<mailto:shay.ba...@elasticsearch.com
<mailto:shay.ba...@elasticsearch.com>>>
> wrote:
> >
> > > > > > > Strange, not sure how this file ended up in the
gateway,
> I can't
> > > see
> > > > > where
> > > > > > > elasticsearch would write it. It only writes __xxx
files
> (no .
> > > > > something)
> > > > > > > and commit- files. I will fix it to ignore files
that
> don't conform
> > > to
> > > > > the
> > > > > > > format, but we should try and understand where its
> coming from...
> >
> > > > > > > -shay.banon
> >
> > > > > > > On Tue, Aug 24, 2010 at 10:25 PM, Grant Rodgers
> <gra...@gmail.com <mailto:gra...@gmail.com>
<mailto:gra...@gmail.com <mailto:gra...@gmail.com>>>
> > > > > wrote:
> > > > > > > > Oh btw that error was with 80c7135
> >
> > > > > > > > On Aug 23, 12:22 pm, Shay Banon
> <shay.ba...@elasticsearch.com
<mailto:shay.ba...@elasticsearch.com>
<mailto:shay.ba...@elasticsearch.com
<mailto:shay.ba...@elasticsearch.com>>>
> > > > > wrote:
> > > > > > > > > Yes, though there is a whole "workflow" level
APIs
> and support
> > > for.
> > > > > But,
> > > > > > > > the
> > > > > > > > > basics are there in the gateway.
> >
> > > > > > > > > -shay.banon
> >
> > > > > > > > > On Mon, Aug 23, 2010 at 10:21 PM, Tal
> <talsalm...@gmail.com <mailto:talsalm...@gmail.com>
<mailto:talsalm...@gmail.com <mailto:talsalm...@gmail.com>>>
> > > > > wrote:
> > > > > > > > > > Very nice feature indeed.
> > > > > > > > > > Will this also allow for index level commit
points?
> >
> > > > > > > > > > Tal
> >
> > > > > > > > > > On Aug 23, 10:11 pm, Shay Banon <
> > > shay.ba...@elasticsearch.com
<mailto:shay.ba...@elasticsearch.com>
<mailto:shay.ba...@elasticsearch.com
<mailto:shay.ba...@elasticsearch.com>>>
> > > > > wrote:
> > > > > > > > > > > Hi,
> >
> > > > > > > > > > > I am going to push (pretty soon) a
major
> rewrite of the
> > > > > gateway
> > > > > > > > > > module
> > > > > > > > > > > and improved throttling support. The
gateway
> change is a
> > > > > breaking
> > > > > > > > change,
> > > > > > > > > > > meaning that new version will not be able
to
> recover from a
> > > 0.9
> > > > > > > > gateway.
> > > > > > > > > > I
> > > > > > > > > > > will provide an upgrade script for file
based
> gateway, s3
> > > based
> > > > > > > > gateway
> > > > > > > > > > will
> > > > > > > > > > > require reindexing, though, potentially,
that
> script can be
> > > > > adjusted
> > > > > > > > to
> > > > > > > > > > > support it.
> >
> > > > > > > > > > > Let me explain some of the changes. The
first is
> > > throttling
> > > > > > > > support.
> > > > > > > > > > In
> > > > > > > > > > > 0.9, recoveries are being throttled on a
> specific node in
> > > order
> > > > > to
> > > > > > > > reduce
> > > > > > > > > > > the load a that node. The throttling was
done on
> the node
> > > > > level,
> > > > > > > > after a
> > > > > > > > > > > shard has been allocated to it. Maintain
the
> count of
> > > current
> > > > > > > > recoveries
> > > > > > > > > > is
> > > > > > > > > > > quite tricky because of the complexity of
the
> recovery
> > > process.
> > > > > This
> > > > > > > > has
> > > > > > > > > > now
> > > > > > > > > > > been refactored into a better place, which
is
> the actual
> > > > > > > > > > > allocation algorithm that runs and shuffles
> shards around.
> >
> > > > > > > > > > > The more interesting change is the
gateway.
> There were
> > > > > several
> > > > > > > > > > problems
> > > > > > > > > > > with how the gateway works today that were
> exposed by a
> > > user of
> > > > > > > > > > > elasticsearch that stores 4TB data (several
> indices, each
> > > with
> > > > > 10
> > > > > > > > shards
> > > > > > > > > > and
> > > > > > > > > > > 2 replicas, which sums it up to 12TB).
This has
> uncovered
> > > some
> > > > > > > > problems
> > > > > > > > > > with
> > > > > > > > > > > the current design, specifically how md5
are
> computed (and
> > > the
> > > > > time
> > > > > > > > it
> > > > > > > > > > takes
> > > > > > > > > > > to compute them on the local storage on
ec2), as
> well as
> > > other
> > > > > > > > > > possibilities
> > > > > > > > > > > for gateway corruptions using this load.
Of course,
> > > > > elasticsearch aim
> > > > > > > > is
> > > > > > > > > > to
> > > > > > > > > > > be able to store much more data than that,
we
> are getting
> > > > > there... .
> >
> > > > > > > > > > > In general, the new implementation
works (in
> spirit) in
> > > the
> > > > > same
> > > > > > > > > > manner
> > > > > > > > > > > git works. Each snapshot is a commit
point, that
> stores
> > > files
> > > > > in the
> > > > > > > > > > gateway
> > > > > > > > > > > into an auto generated name, and finally, a
> commit point is
> > > > > written
> > > > > > > > with
> > > > > > > > > > the
> > > > > > > > > > > "directory" which maps between this pseudo
name
> to physical
> > > > > name, and
> > > > > > > > the
> > > > > > > > > > > size. The new design allows for more
resiliency
> when it
> > > comes
> > > > > to
> > > > > > > > > > corruption.
> > > > > > > > > > > It also allows for exciting future
features like
> saving a
> > > > > commit
> > > > > > > > point
> > > > > > > > > > and
> > > > > > > > > > > restoring from it, or automatically create
a
> commit point
> > > each
> > > > > day
> > > > > > > > for
> > > > > > > > > > the
> > > > > > > > > > > last 5 days and be able to rollback to a
> specific commit
> > > point.
> >
> > > > > > > > > > > The aim is to create a gateway storage
that
> is going to
> > > be
> > > > > the
> > > > > > > > final
> > > > > > > > > > > version, and resilient for future changes.
It
> takes some
> > > time
> > > > > to get
> > > > > > > > > > there,
> > > > > > > > > > > but once we are there, I can safely stand
behind
> using
> > > > > elasticsearch
> > > > > > > > as
> > > > > > > > > > the
> > > > > > > > > > > main storage as well as releasing v1.0 (I
think
> > > elasticsearch
> > > > > has
> > > > > > > > enough
> > > > > > > > > > > features for 1.0, just the stability of the
> gateway is
> > > needed).
> >
> > > > > > > > > > > I would love for people to take this for
a
> ride and
> > > check it
> > > > > out.
> > > > > > > > The
> > > > > > > > > > > next version, as a result of that is going
to be
> 0.10, and
> > > I
> > > > > will
> > > > > > > > release
> > > > > > > > > > it
> > > > > > > > > > > in the following days.
> >
> > > > > > > > > > > -shay.banon
>
>