How long do deleted documents remain in the index?

Hi all,

in a previous questionhttps://groups.google.com/d/msg/elasticsearch/IU0b09LYs98/Z8ysJe9wdmEJ I
was asking how long it takes for documents to get expunged from the index.
I was pointed to index.gc_deletes on SOhttp://stackoverflow.com/q/17861268/178526 (and
created #3396 https://github.com/elasticsearch/elasticsearch/issues/3396 thereafter).
However, I still didn't get an answer if this setting guarantees the
document being available or if it's merely the maximum.

So does index.gc_deletes guarantee that a version will be remembered for
the configured time? Or may merges/optimizations throw it away regardless?

On a higher level, I'm looking for this answer: How can I guarantee that in
my usecase https://github.com/molindo/molindo-elasticsync deletes won't
be overwritten by out-of-order index operations for an indefinite or
predefined time?

Thanks, Stefan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Stefan,

Nothing is guaranteed. But in general, the time is takes for a deleted
document to be truly gone is close to the configured time.

I seem to recall a recent discussion that created a bug tracking issue for
a delete whose time to live is negative but it still shows up.

If you update an existing document that is scheduled for deletion but its
time-to-live has not yet expired (that is, it's still positive), the
document can be updated and its time-to-live can also be changed. I have
test cases that exercise this function, and it works very well.

On a side note: If you want fully deterministic cross-site record locking,
Elasticsearch won't be the solution. In this case:

  1. Use Oracle.
  2. Run for the hills as fast as you can!

Or better yet, design the application so that it doesn't need any behavior
relating to fully deterministic cross-site record locking. In general, ES
feels as if they backed off from being 100% deterministic and that's what
lets them achieve stellar results and great performance. Those that try for
the 100% usually crash and burn.

Brian

On Monday, July 29, 2013 4:47:51 AM UTC-4, Stefan Fußenegger wrote:

Hi all,

in a previous questionhttps://groups.google.com/d/msg/elasticsearch/IU0b09LYs98/Z8ysJe9wdmEJ I
was asking how long it takes for documents to get expunged from the index.
I was pointed to index.gc_deletes on SOhttp://stackoverflow.com/q/17861268/178526 (and
created #3396 https://github.com/elasticsearch/elasticsearch/issues/3396 thereafter).
However, I still didn't get an answer if this setting guarantees the
document being available or if it's merely the maximum.

So does index.gc_deletes guarantee that a version will be remembered for
the configured time? Or may merges/optimizations throw it away regardless?

On a higher level, I'm looking for this answer: How can I guarantee that in
my usecase https://github.com/molindo/molindo-elasticsync deletes won't
be overwritten by out-of-order index operations for an indefinite or
predefined time?

Thanks, Stefan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Stefan,

Here's a link to the topic relating to negative TTL (time to live) values
in expired documents:

https://groups.google.com/d/topic/elasticsearch/ifvWZJjQuvU/discussion

Here's a link to the topic that starts with my initial TTL question,
contains a lot of very helpful answers and comments, and ends with a
working example:

https://groups.google.com/d/topic/elasticsearch/4-_lNiP8mps/discussion

By the way, when I mentioned that "nothing is guaranteed", I should have
statedt that the document will live at least as long as its TTL (time to
live) value, but how much longer it lives is the part that's not currently
deterministic.

In other words, a document with a TTL of 5 minutes will live at least
another 5 minutes. But how much longer it lives past that 5 minutes is
currently not deterministic. (though it should be more deterministic per
the information from the first link above).

Brian

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Brian,

Thanks for your effort, but I'm not talking about expired documents (as in
TTL) but actually and manually deleted ones (as described in my previous
questionhttps://groups.google.com/d/msg/elasticsearch/IU0b09LYs98/Z8ysJe9wdmEJ).
Deleted documents are still considered for optimistic locking purposes. By
default, they are fully expunged after 60s (index.gc_deletes). But as this
behaviour/setting isn't well documented, I'm looking for answers here.

This is useful if operations appear out of order (e.g. index v1, delete v3,
index v2). As I said, it does work. All I'm trying to figure out is how
reliable it is or what other settings I might temporarily change (e.g.
disabling optimizations).

My usecase is a tool that synchronizes indices to facilitate upgrades with
zero downtime https://github.com/molindo/molindo-elasticsync.

Thanks, Stefan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

My apologies, Stefan. I found your earlier question
(https://groups.google.com/d/topic/elasticsearch/IU0b09LYs98/discussion),
and Adrien's answer along with your clarification helps.

Again, I am only a (very happy) user of ES and not a developer of ES. But
my experience has been that once the delete completes and returns to my
caller, that record is as if it's gone (whether hidden or expunged, it
doesn't seem to matter).

So I would hazard a guess and say that out of order actions do matter. In
fact, I have one application where I get a stream of updates as a mix of
either add/update or delete. The order is important; the customer has
specifically ordered the transactions so that the most recent update to a
document is the one that should be reflected after all updates have been
applied. Therefore, I never split an update stream to multiple threads;
instead, I always issue the updates sequentially so that they are always
applied in the correct order. Happily, ES performance and back-end
threading is such that they breeze through. (And all this is against an
index with about 97 million documents with daily updates totaling perhaps
200K to several million per day).

I haven't fully explored external versioning, but the little I've explored
it leads me to think that as long as your application can assign version
numbers that correspond to the order, then out of order transactions will
still yield the correct final result.

For example, if your v1, v2, and v3 are indeed increasing version numbers,
then the following two series will result in the same state: the document
is deleted.

index v1, delete v3, index v2

index v1, index v2, delete v3

This guess is based on my experience that even a deleted document's version
is somehow seen by ES. If I use internal version numbers, then a
successful sequence of {create, index, delete, index, delete, index}
results in a document version of 6. But I'm not fully sure how externalversioning works with deleted documents. If it works as it does with
internal versioning, then you should be OK.

Brian

On Monday, July 29, 2013 12:26:34 PM UTC-4, Stefan Fußenegger wrote:

Brian,

Thanks for your effort, but I'm not talking about expired documents (as in
TTL) but actually and manually deleted ones (as described in my previous
questionhttps://groups.google.com/d/msg/elasticsearch/IU0b09LYs98/Z8ysJe9wdmEJ).
Deleted documents are still considered for optimistic locking purposes. By
default, they are fully expunged after 60s (index.gc_deletes). But as
this behaviour/setting isn't well documented, I'm looking for answers here.

This is useful if operations appear out of order (e.g. index v1, delete
v3, index v2). As I said, it does work. All I'm trying to figure out is how
reliable it is or what other settings I might temporarily change (e.g.
disabling optimizations).

My usecase is a tool that synchronizes indices to facilitate upgrades
with zero downtime https://github.com/molindo/molindo-elasticsync.

Thanks, Stefan

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.