Clear deleted docs


(Christophe Rosko) #1

Hi !

I have many deleted documents in my index and I'd like to clear them so
that df/ttf statistics aren't skewed by these deleted docs.
I try the following command :
curl -XPOST '
http://localhost:9200/myindex/_optimize?only_expunge_deletes=true'
But the number of deleted documents is still the same, as I can see using
head plugin.

How can I do to clear these deleted documents ?

Thanks for you help !

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/d1dac23e-8d48-4a9b-9883-2f3e3d7e34b2%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(vineeth mohan-2) #2

Hi ,

Ideally the enitre set of deleted documents can would be only removed when
you execute a full merge into 1 segment.
But then this eats lots of time if you have a large index.

$ curl -XPOST 'http://localhost:9200/twitter/_optimize?max_num_segments=1'

On using only_expunge_deletes , you are suggesting ES to prioritize those
segments which have higher number of deleted documents while merging. But
then if the merge policy feels it doesnt need to merge any more document ,
it wont perform it.

Thanks
Vineeth Mohan,
Elasticsearch consultant,
qbox.io ( Elasticsearch service provider http://qbox.io/)

On Fri, Mar 13, 2015 at 5:05 PM, Christophe Rosko christophe@digiteka.com
wrote:

Hi !

I have many deleted documents in my index and I'd like to clear them so
that df/ttf statistics aren't skewed by these deleted docs.
I try the following command :
curl -XPOST 'http://localhost:9200/myindex/optimize?only
expunge_deletes=true'
But the number of deleted documents is still the same, as I can see using
head plugin.

How can I do to clear these deleted documents ?

Thanks for you help !

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d1dac23e-8d48-4a9b-9883-2f3e3d7e34b2%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d1dac23e-8d48-4a9b-9883-2f3e3d7e34b2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAGdPd5kP0yTxtWqHC-%3D7-RFfn4qfrd80RmLaZP83w8SJSaJRwA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


Kibana 4 and KOPF showing deleted documents deleted from Elasticsearch
(Christophe Rosko) #3

Hi Vineeth,

Thanks for your answer.
So I understand that the only way to surely clear all deleted documents is
to use max_num_documents=1 and indeed in my case, this takes too much time
to be launched often.

In fact my problem is that it seems df statistic is calculated among all
docs, including deleted docs.
Do you know if there is a way to get it calculated only among docs that are
not deleted ?

Thanks

Le vendredi 13 mars 2015 12:58:16 UTC+1, vineeth mohan a écrit :

Hi ,

Ideally the enitre set of deleted documents can would be only removed
when you execute a full merge into 1 segment.
But then this eats lots of time if you have a large index.

$ curl -XPOST 'http://localhost:9200/twitter/_optimize?max_num_segments=1'

On using only_expunge_deletes , you are suggesting ES to prioritize those
segments which have higher number of deleted documents while merging. But
then if the merge policy feels it doesnt need to merge any more document ,
it wont perform it.

Thanks
Vineeth Mohan,
Elasticsearch consultant,
qbox.io ( Elasticsearch service provider http://qbox.io/)

On Fri, Mar 13, 2015 at 5:05 PM, Christophe Rosko <chris...@digiteka.com
<javascript:>> wrote:

Hi !

I have many deleted documents in my index and I'd like to clear them so
that df/ttf statistics aren't skewed by these deleted docs.
I try the following command :
curl -XPOST 'http://localhost:9200/myindex/optimize?only
expunge_deletes=true'
But the number of deleted documents is still the same, as I can see using
head plugin.

How can I do to clear these deleted documents ?

Thanks for you help !

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/d1dac23e-8d48-4a9b-9883-2f3e3d7e34b2%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d1dac23e-8d48-4a9b-9883-2f3e3d7e34b2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8911507f-2e49-4db5-998b-b1dc953bc99a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Michael McCandless) #4

Note that only_expunge_deletes=true will only merge the segment away if it
has > 10% delete docs by default, otherwise it leaves the segment as is.

See
http://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-merge.html
on how to change that 10% default.

But it's almost always unproductive to fret about deleted docs; see this
blog post I wrote recently:

Also the docFreq stat for a given term will always include the deleted docs
but this is usually not a problem unless the docs you've deleted vary
drastically statistically from the remaining docs.

Mike McCandless

http://www.elastic.co

On Fri, Mar 13, 2015 at 5:20 AM, Christophe Rosko christophe@digiteka.com
wrote:

Hi Vineeth,

Thanks for your answer.
So I understand that the only way to surely clear all deleted documents is
to use max_num_documents=1 and indeed in my case, this takes too much time
to be launched often.

In fact my problem is that it seems df statistic is calculated among all
docs, including deleted docs.
Do you know if there is a way to get it calculated only among docs that
are not deleted ?

Thanks

Le vendredi 13 mars 2015 12:58:16 UTC+1, vineeth mohan a écrit :

Hi ,

Ideally the enitre set of deleted documents can would be only removed
when you execute a full merge into 1 segment.
But then this eats lots of time if you have a large index.

$ curl -XPOST 'http://localhost:9200/twitter/_optimize?max_num_segments=1'

On using only_expunge_deletes , you are suggesting ES to prioritize those
segments which have higher number of deleted documents while merging. But
then if the merge policy feels it doesnt need to merge any more document ,
it wont perform it.

Thanks
Vineeth Mohan,
Elasticsearch consultant,
qbox.io ( Elasticsearch service provider http://qbox.io/)

On Fri, Mar 13, 2015 at 5:05 PM, Christophe Rosko chris...@digiteka.com
wrote:

Hi !

I have many deleted documents in my index and I'd like to clear them so
that df/ttf statistics aren't skewed by these deleted docs.
I try the following command :
curl -XPOST 'http://localhost:9200/myindex/optimize?only_expunge
deletes=true'
But the number of deleted documents is still the same, as I can see
using head plugin.

How can I do to clear these deleted documents ?

Thanks for you help !

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/d1dac23e-8d48-4a9b-9883-2f3e3d7e34b2%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d1dac23e-8d48-4a9b-9883-2f3e3d7e34b2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8911507f-2e49-4db5-998b-b1dc953bc99a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8911507f-2e49-4db5-998b-b1dc953bc99a%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKHUQPgrYeR0ueS0Zz6CM0fvpQXY1CEo_77-%2BFd%2BVoYrgWyDLA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.


(Christophe Rosko) #5

Hi Mike,

As I have lots of updates only on recent docs, terms that have been
recently indexed have their docFreq really impacted for a while.
I'll keep thinking about a way to limit the impact on docFreqs and not
worry too much about deleted docs.

Thanks for your answer and your post, it made things much clearer for me.

Christophe

Le vendredi 13 mars 2015 14:03:45 UTC+1, Michael McCandless a écrit :

Note that only_expunge_deletes=true will only merge the segment away if it
has > 10% delete docs by default, otherwise it leaves the segment as is.

See
http://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-merge.html
on how to change that 10% default.

But it's almost always unproductive to fret about deleted docs; see this
blog post I wrote recently:
https://www.elastic.co/blog/lucenes-handling-of-deleted-documents

Also the docFreq stat for a given term will always include the deleted
docs but this is usually not a problem unless the docs you've deleted vary
drastically statistically from the remaining docs.

Mike McCandless

http://www.elastic.co

On Fri, Mar 13, 2015 at 5:20 AM, Christophe Rosko <chris...@digiteka.com
<javascript:>> wrote:

Hi Vineeth,

Thanks for your answer.
So I understand that the only way to surely clear all deleted documents
is to use max_num_documents=1 and indeed in my case, this takes too much
time to be launched often.

In fact my problem is that it seems df statistic is calculated among all
docs, including deleted docs.
Do you know if there is a way to get it calculated only among docs that
are not deleted ?

Thanks

Le vendredi 13 mars 2015 12:58:16 UTC+1, vineeth mohan a écrit :

Hi ,

Ideally the enitre set of deleted documents can would be only removed
when you execute a full merge into 1 segment.
But then this eats lots of time if you have a large index.

$ curl -XPOST 'http://localhost:9200/twitter/_optimize?max_num_segments=1'

On using only_expunge_deletes , you are suggesting ES to prioritize
those segments which have higher number of deleted documents while merging.
But then if the merge policy feels it doesnt need to merge any more
document , it wont perform it.

Thanks
Vineeth Mohan,
Elasticsearch consultant,
qbox.io ( Elasticsearch service provider http://qbox.io/)

On Fri, Mar 13, 2015 at 5:05 PM, Christophe Rosko <chris...@digiteka.com

wrote:

Hi !

I have many deleted documents in my index and I'd like to clear them so
that df/ttf statistics aren't skewed by these deleted docs.
I try the following command :
curl -XPOST 'http://localhost:9200/myindex/optimize?only_expunge
deletes=true'
But the number of deleted documents is still the same, as I can see
using head plugin.

How can I do to clear these deleted documents ?

Thanks for you help !

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/d1dac23e-8d48-4a9b-9883-2f3e3d7e34b2%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/d1dac23e-8d48-4a9b-9883-2f3e3d7e34b2%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8911507f-2e49-4db5-998b-b1dc953bc99a%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8911507f-2e49-4db5-998b-b1dc953bc99a%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/cadb1788-dd94-4f08-bf98-b7f91eaaca07%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Inderjeet Singh) #6

I tried this method by Vineeth
$ curl -XPOST 'http://localhost:9200/twitter/_optimize?max_num_segments=1'

It decreases around 80 MB of data but the number of documents are still the same. Need a way of deleting all the documents. Pls assist
Thanks


(system) #7