Persistent transaction log

Hi,

Is there a way to somehow persist the transaction log to be "replayed"
at a later times?
I think that in many situations when re-indexing is required (as a
result of some catastrophe) it would be nice to be able to replay the
transaction log (maybe from the last X hours).

A more specific example: I use elastic search to index some content in
the database. I have triggers that fire an index request when
something is updated in the database (it's important for me to have
the data in ES reflect the database as closely as possible).
To support disaster recovery I backup the database and snapshot the ES
gateway files every once in a while.
When I want to recover, I restore the database and the gateway.
The problem is that the database backup and gateway snapshot have not
been created at the exact same time so there may be some missing data
in ES.
What I do to solve this is to re-index (from the database) the last X
hours to make sure ES and the database are in sync.
After this long story... I was thinking that it would be nicer if I
could somehow replay a persistent transaction log instead of hitting
the database to get the data.
Is this somehow possible?

Thanks,
Tal

Hi,

Elasticsearch does not store a boundless transaction log, it only stores a
transaction log till the next flush is executed (changes are "committed" to
the index). It also, by default, does a flush every 5000 operations (per
shard).

In general, you should let the snapshotting take place every short
interval of time (by default its 10 seconds). The snapshotting is delta
snapshots, so in the general case, there won't be a lot of data to sync.

-shay.banon

On Mon, Aug 9, 2010 at 7:27 AM, Tal talsalmona@gmail.com wrote:

Hi,

Is there a way to somehow persist the transaction log to be "replayed"
at a later times?
I think that in many situations when re-indexing is required (as a
result of some catastrophe) it would be nice to be able to replay the
transaction log (maybe from the last X hours).

A more specific example: I use Elasticsearch to index some content in
the database. I have triggers that fire an index request when
something is updated in the database (it's important for me to have
the data in ES reflect the database as closely as possible).
To support disaster recovery I backup the database and snapshot the ES
gateway files every once in a while.
When I want to recover, I restore the database and the gateway.
The problem is that the database backup and gateway snapshot have not
been created at the exact same time so there may be some missing data
in ES.
What I do to solve this is to re-index (from the database) the last X
hours to make sure ES and the database are in sync.
After this long story... I was thinking that it would be nicer if I
could somehow replay a persistent transaction log instead of hitting
the database to get the data.
Is this somehow possible?

Thanks,
Tal

I see.
OK, thanks.

Tal

On Aug 9, 5:17 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi,

Elasticsearch does not store a boundless transaction log, it only stores a
transaction log till the next flush is executed (changes are "committed" to
the index). It also, by default, does a flush every 5000 operations (per
shard).

In general, you should let the snapshotting take place every short
interval of time (by default its 10 seconds). The snapshotting is delta
snapshots, so in the general case, there won't be a lot of data to sync.

-shay.banon

On Mon, Aug 9, 2010 at 7:27 AM, Tal talsalm...@gmail.com wrote:

Hi,

Is there a way to somehow persist the transaction log to be "replayed"
at a later times?
I think that in many situations when re-indexing is required (as a
result of some catastrophe) it would be nice to be able to replay the
transaction log (maybe from the last X hours).

A more specific example: I use Elasticsearch to index some content in
the database. I have triggers that fire an index request when
something is updated in the database (it's important for me to have
the data in ES reflect the database as closely as possible).
To support disaster recovery I backup the database and snapshot the ES
gateway files every once in a while.
When I want to recover, I restore the database and the gateway.
The problem is that the database backup and gateway snapshot have not
been created at the exact same time so there may be some missing data
in ES.
What I do to solve this is to re-index (from the database) the last X
hours to make sure ES and the database are in sync.
After this long story... I was thinking that it would be nicer if I
could somehow replay a persistent transaction log instead of hitting
the database to get the data.
Is this somehow possible?

Thanks,
Tal