Segments marked as deleted?

Hi,

I use segmentspy to detect segment merges in ES. But I find there's
several "Deleted Docs" which is like "Deleted Docs: 1.55630".
I know ES will mark items as deleted and remove them later through GC.

But what's the meaning of those deleted segments? Why does ES/lucene not
remove them? There's no updates in this index since 10 days ago.
How can I clean them out explicitly?

Thank you.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8f0458a6-53ee-4e38-a3cc-6564eb693040%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

ES removes deleted docs via merging, not GC.

You can force a merge using the optimise API -

On 3 February 2015 at 18:26, Jason Zhang mock2u@gmail.com wrote:

Hi,

I use segmentspy to detect segment merges in ES. But I find there's
several "Deleted Docs" which is like "Deleted Docs: 1.55630".
I know ES will mark items as deleted and remove them later through GC.

But what's the meaning of those deleted segments? Why does ES/lucene not
remove them? There's no updates in this index since 10 days ago.
How can I clean them out explicitly?

Thank you.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8f0458a6-53ee-4e38-a3cc-6564eb693040%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8f0458a6-53ee-4e38-a3cc-6564eb693040%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X9nurQur0KfHSv9TJgQf9FsoRSKNXOLUjAJaVbnhp9DEg%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

I've tried curl -XPOST http://foo:9200/bar/_optimize, but there's still
some "Deleted Docs" here. Is this normal?

On Tuesday, February 3, 2015 at 3:31:02 PM UTC+8, Mark Walkom wrote:

ES removes deleted docs via merging, not GC.

You can force a merge using the optimise API -
Elasticsearch Platform — Find real-time answers at scale | Elastic

On 3 February 2015 at 18:26, Jason Zhang <moc...@gmail.com <javascript:>>
wrote:

Hi,

I use segmentspy to detect segment merges in ES. But I find there's
several "Deleted Docs" which is like "Deleted Docs: 1.55630".
I know ES will mark items as deleted and remove them later through GC.

But what's the meaning of those deleted segments? Why does ES/lucene
not remove them? There's no updates in this index since 10 days ago.
How can I clean them out explicitly?

Thank you.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8f0458a6-53ee-4e38-a3cc-6564eb693040%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8f0458a6-53ee-4e38-a3cc-6564eb693040%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8158bc66-2970-43b3-b771-6bc3d1c6cd87%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

How many?

The amount of deleted docs really depends on the atomicity of the index.

On 3 February 2015 at 18:44, Jason Zhang mock2u@gmail.com wrote:

I've tried curl -XPOST http://foo:9200/bar/_optimize
http://foo:9200/bar/_optimize, but there's still some "Deleted Docs"
here. Is this normal?

On Tuesday, February 3, 2015 at 3:31:02 PM UTC+8, Mark Walkom wrote:

ES removes deleted docs via merging, not GC.

You can force a merge using the optimise API -
Elasticsearch Platform — Find real-time answers at scale | Elastic
reference/current/indices-optimize.html#indices-optimize - but be aware
that it is resource intensive. Also be sure to read this blog post -
Elasticsearch Platform — Find real-time answers at scale | Elastic

On 3 February 2015 at 18:26, Jason Zhang moc...@gmail.com wrote:

Hi,

I use segmentspy to detect segment merges in ES. But I find there's
several "Deleted Docs" which is like "Deleted Docs: 1.55630".
I know ES will mark items as deleted and remove them later through GC.

But what's the meaning of those deleted segments? Why does ES/lucene
not remove them? There's no updates in this index since 10 days ago.
How can I clean them out explicitly?

Thank you.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/8f0458a6-53ee-4e38-a3cc-6564eb693040%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8f0458a6-53ee-4e38-a3cc-6564eb693040%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8158bc66-2970-43b3-b771-6bc3d1c6cd87%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8158bc66-2970-43b3-b771-6bc3d1c6cd87%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAEYi1X8F0FznwO3QBbCkvTwMD2H_FpyRcDdu8xoPLr6JCU%3D8CA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Also I forgot to say I didn't do any deletions in this index.

On Tuesday, February 3, 2015 at 3:31:02 PM UTC+8, Mark Walkom wrote:

ES removes deleted docs via merging, not GC.

You can force a merge using the optimise API -
Elasticsearch Platform — Find real-time answers at scale | Elastic

On 3 February 2015 at 18:26, Jason Zhang <moc...@gmail.com <javascript:>>
wrote:

Hi,

I use segmentspy to detect segment merges in ES. But I find there's
several "Deleted Docs" which is like "Deleted Docs: 1.55630".
I know ES will mark items as deleted and remove them later through GC.

But what's the meaning of those deleted segments? Why does ES/lucene
not remove them? There's no updates in this index since 10 days ago.
How can I clean them out explicitly?

Thank you.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8f0458a6-53ee-4e38-a3cc-6564eb693040%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8f0458a6-53ee-4e38-a3cc-6564eb693040%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/68bacf0e-acff-4cc1-8d79-150a6e40c98b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Very little.
Every "Deleted Docs" says like "Deleted Docs: 1.5563", not integer. What
does the number mean?

The sum may be less than 10 while the index has over 3 million docs.

On Tuesday, February 3, 2015 at 3:52:59 PM UTC+8, Mark Walkom wrote:

How many?

The amount of deleted docs really depends on the datomicity of the index.

On 3 February 2015 at 18:44, Jason Zhang <moc...@gmail.com <javascript:>>
wrote:

I've tried curl -XPOST http://foo:9200/bar/_optimize
http://foo:9200/bar/_optimize, but there's still some "Deleted Docs"
here. Is this normal?

On Tuesday, February 3, 2015 at 3:31:02 PM UTC+8, Mark Walkom wrote:

ES removes deleted docs via merging, not GC.

You can force a merge using the optimise API -
Elasticsearch Platform — Find real-time answers at scale | Elastic
reference/current/indices-optimize.html#indices-optimize - but be aware
that it is resource intensive. Also be sure to read this blog post -
Elasticsearch Platform — Find real-time answers at scale | Elastic

On 3 February 2015 at 18:26, Jason Zhang moc...@gmail.com wrote:

Hi,

I use segmentspy to detect segment merges in ES. But I find there's
several "Deleted Docs" which is like "Deleted Docs: 1.55630".
I know ES will mark items as deleted and remove them later through GC.

But what's the meaning of those deleted segments? Why does ES/lucene
not remove them? There's no updates in this index since 10 days ago.
How can I clean them out explicitly?

Thank you.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/8f0458a6-53ee-4e38-a3cc-6564eb693040%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8f0458a6-53ee-4e38-a3cc-6564eb693040%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8158bc66-2970-43b3-b771-6bc3d1c6cd87%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8158bc66-2970-43b3-b771-6bc3d1c6cd87%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/f5e0c11d-5f9d-41e7-97c7-2cf943efe3fb%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

This only means that segment merging is going on. This produces generations
of segments that are obsolete, these are internally marked as "deleted".
Later the obsolete generations are cleaned up automatically when the
segment is touched by significant amount of new data. Do not worry about
that, there is nothing you should do. Of course, you can play with
_optimize API, but the downside is imposing heavy load on the node.

Jörg

On Tue, Feb 3, 2015 at 8:26 AM, Jason Zhang mock2u@gmail.com wrote:

Hi,

I use segmentspy to detect segment merges in ES. But I find there's
several "Deleted Docs" which is like "Deleted Docs: 1.55630".
I know ES will mark items as deleted and remove them later through GC.

But what's the meaning of those deleted segments? Why does ES/lucene not
remove them? There's no updates in this index since 10 days ago.
How can I clean them out explicitly?

Thank you.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/8f0458a6-53ee-4e38-a3cc-6564eb693040%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/8f0458a6-53ee-4e38-a3cc-6564eb693040%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAKdsXoHf3pu24P_fEd70i_8kqVjhJQG_27TLuKz99qd08iwhWA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Yeah but if you update a doc, it will delete and insert behind the scene.

--
David :wink:
Twitter : @dadoonet / @elasticsearchfr / @scrutmydocs

Le 3 févr. 2015 à 08:53, Jason Zhang mock2u@gmail.com a écrit :

Also I forgot to say I didn't do any deletions in this index.

On Tuesday, February 3, 2015 at 3:31:02 PM UTC+8, Mark Walkom wrote:
ES removes deleted docs via merging, not GC.

You can force a merge using the optimise API - Elasticsearch Platform — Find real-time answers at scale | Elastic - but be aware that it is resource intensive. Also be sure to read this blog post - Elasticsearch Platform — Find real-time answers at scale | Elastic

On 3 February 2015 at 18:26, Jason Zhang moc...@gmail.com wrote:
Hi,

I use segmentspy to detect segment merges in ES. But I find there's several "Deleted Docs" which is like "Deleted Docs: 1.55630".
I know ES will mark items as deleted and remove them later through GC.

But what's the meaning of those deleted segments? Why does ES/lucene not remove them? There's no updates in this index since 10 days ago.
How can I clean them out explicitly?

Thank you.

You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/8f0458a6-53ee-4e38-a3cc-6564eb693040%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/68bacf0e-acff-4cc1-8d79-150a6e40c98b%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/240D0E50-75E6-49E3-B6FB-D41049FBF54D%40pilato.fr.
For more options, visit https://groups.google.com/d/optout.