Purge the deleted documents on disk

Hi all,

If there's any api to clear all the deleted documents on disk?
I read that
Deleting a document doesn’t immediately remove the document from disk — it
just marks it as deleted.
Elasticsearch will clean up deleted documents in the background as you
continue to index more data.

By request the index stats, it shows
"primaries": {

  • "docs": {
    • "count": 3268352,
    • "deleted": 71249
      } ....}

I run refresh, or optimize on my index, But it only cleaned a small number
of deleted documents from disk. Is there any way I can clean all the
deleted docs? to reduce this number to 0?

Thanks
Wei

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/adc8877e-dbc4-4751-bf4b-05f346e6c497%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hello Wei ,

You can use the in optimize API - max_num_segments as 1 or
only_expunge_deletes as true .
OPTIMIZE -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-optimize.html#indices-optimize

Thanks
Vineeth

On Sat, Sep 13, 2014 at 5:32 AM, Wei wshen@groupon.com wrote:

Hi all,

If there's any api to clear all the deleted documents on disk?
I read that
Deleting a document doesn’t immediately remove the document from disk — it
just marks it as deleted.
Elasticsearch will clean up deleted documents in the background as you
continue to index more data.

By request the index stats, it shows
"primaries": {

  • "docs": {
    • "count": 3268352,
    • "deleted": 71249
      } ....}

I run refresh, or optimize on my index, But it only cleaned a small number
of deleted documents from disk. Is there any way I can clean all the
deleted docs? to reduce this number to 0?

Thanks
Wei

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/adc8877e-dbc4-4751-bf4b-05f346e6c497%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/adc8877e-dbc4-4751-bf4b-05f346e6c497%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kqh9JRyE77K1%3DgT3U0F7L1S9ERTCCPpz0obKH4Eqa5vw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Hi Vineeth,

Thanks a lot for your response.

However I've tried setting max_num_segments to 1 and only_expunge_deletes
to true, using the command

curl -XPOST
'http://localhost:9200/_optimize?only_expunge_deletes=true&max_num_segments=1'

This only cleared some of the deleted documents. e.g, the number of deleted
documents reduced from 8k to 7k. But cannot merge all of them. I'm running
es v0.20.5. Is there a way I can force merge all deleted documents?

p.s. I'm upgrading the ES from 0.20.5 to 1.2.1, shards are damaged after
migration, complaining on the write handler to the .del files. I have a
feeling this is because of the un-merged deleted documents.
Thanks
Wei

On Friday, September 12, 2014 8:16:25 PM UTC-7, vineeth mohan wrote:

Hello Wei ,

You can use the in optimize API - max_num_segments as 1 or
only_expunge_deletes as true .
OPTIMIZE -
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/indices-optimize.html#indices-optimize

Thanks
Vineeth

On Sat, Sep 13, 2014 at 5:32 AM, Wei <ws...@groupon.com <javascript:>>
wrote:

Hi all,

If there's any api to clear all the deleted documents on disk?
I read that
Deleting a document doesn’t immediately remove the document from disk —
it just marks it as deleted.
Elasticsearch will clean up deleted documents in the background as you
continue to index more data.

By request the index stats, it shows
"primaries": {

  • "docs": {
    • "count": 3268352,
    • "deleted": 71249
      } ....}

I run refresh, or optimize on my index, But it only cleaned a small
number of deleted documents from disk. Is there any way I can clean all the
deleted docs? to reduce this number to 0?

Thanks
Wei

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/adc8877e-dbc4-4751-bf4b-05f346e6c497%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/adc8877e-dbc4-4751-bf4b-05f346e6c497%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3d566401-c82d-4ec8-bf91-6f4c0a3f2e96%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

By default Lucene/ES will only merge away the segment if it has "enough"
deletes, where "enough" defaults to 10% of the segment.

The setting is index.merge.policy.expunge_deletes_allowed ... so you can
change that if you want to.

However I would strongly advise not worrying about this: merging is a very
costly operation, and Lucene naturally merges segments over time, favoring
segments that have more deletions. Furthermore, deletions are not that
costly because deleted docs are filtered out at a very low level in Lucene,
before expensive query matching/scoring computations even see them.

Only if your index will never change again is it worth forcing merges ...

Mike McCandless

http://blog.mikemccandless.com

On Sun, Sep 14, 2014 at 7:19 PM, Wei wshen@groupon.com wrote:

Hi Vineeth,

Thanks a lot for your response.

However I've tried setting max_num_segments to 1 and only_expunge_deletes
to true, using the command

curl -XPOST '
http://localhost:9200/_optimize?only_expunge_deletes=true&max_num_segments=1
'

This only cleared some of the deleted documents. e.g, the number of
deleted documents reduced from 8k to 7k. But cannot merge all of them. I'm
running es v0.20.5. Is there a way I can force merge all deleted documents?

p.s. I'm upgrading the ES from 0.20.5 to 1.2.1, shards are damaged after
migration, complaining on the write handler to the .del files. I have a
feeling this is because of the un-merged deleted documents.
Thanks
Wei

On Friday, September 12, 2014 8:16:25 PM UTC-7, vineeth mohan wrote:

Hello Wei ,

You can use the in optimize API - max_num_segments as 1 or
only_expunge_deletes as true .
OPTIMIZE - http://www.elasticsearch.org/guide/en/elasticsearch/
reference/current/indices-optimize.html#indices-optimize

Thanks
Vineeth

On Sat, Sep 13, 2014 at 5:32 AM, Wei ws...@groupon.com wrote:

Hi all,

If there's any api to clear all the deleted documents on disk?
I read that
Deleting a document doesn’t immediately remove the document from disk —
it just marks it as deleted.
Elasticsearch will clean up deleted documents in the background as you
continue to index more data.

By request the index stats, it shows
"primaries": {

  • "docs": {
    • "count": 3268352,
    • "deleted": 71249
      } ....}

I run refresh, or optimize on my index, But it only cleaned a small
number of deleted documents from disk. Is there any way I can clean all the
deleted docs? to reduce this number to 0?

Thanks
Wei

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/adc8877e-dbc4-4751-bf4b-05f346e6c497%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/adc8877e-dbc4-4751-bf4b-05f346e6c497%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/3d566401-c82d-4ec8-bf91-6f4c0a3f2e96%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/3d566401-c82d-4ec8-bf91-6f4c0a3f2e96%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAD7smRcP0kyz0uDS_GGiyr1R%2BjVeaLxeQxfwPypQFpKLyZ2K0A%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.