Sustainable way to regularly purge deleted docs

Jonathan_Foy · August 22, 2014, 3:27am

Hello

I'm in the process of putting a two-node Elasticsearch cluster (1.1.2) into
production, but I'm having a bit of trouble keeping it stable enough for
comfort. Specifically, I'm trying to figure out the best way to keep the
number of deleted documents under control.

Both nodes are r3.xlarge EC2 instances (4 cores, 30.5 GB RAM). The ES
cluster mirrors the primary data store, a MySQL database. Relevant updates
to the database are caught via triggers which populate a table that's
monitored by an indexing process. This results in what I'd consider of lot
of reindexing, any time the primary data is updated. Search and indexing
performance thus far has been in line with expectations when the number of
deleted documents is small, but as it grows (up to 30-40%), the amount of
available RAM becomes limited, ultimately causing memory problems. If I
optimize/purge deletes then things return to normal, though I usually end
up having to restart at least one server if not both due to OOM problems
and shard failures during optimization. When ES becomes the source of all
searches for the application, I can't really afford this downtime.

What would be the preferred course of action here? I do have a window over
the weekend where I could work with somewhat reduced capacity; I was
thinking perhaps I could pull one node out of search rotation, optimize it,
swap it with the other, optimize it, and then go on my way. However, I
don't know that I CAN pull one node out of rotation (it seems like the
search API lets me specify a node, but nothing to say "Node X doesn't need
any searches"), nor does it appear that I can optimize an index on one node
without doing the same to the other.

I've tried tweaking the merge settings to favour segments containing large
numbers of deletions, but it doesn't seem to make enough of a difference.
I've also disabled merge throttling (I do have SSD-backed storage). Is
there any safe way to perform regular maintenance on the cluster,
preferably one node at a time, without causing TOO many problems? Am I
just trying to do too much with the hardware I have?

Any advice is appreciated. Let me know what info I left out that would
help.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/65b96db1-0e56-4681-b73d-c21365983199%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jpountz · August 22, 2014, 11:14pm

Hi Jonathan,

The default merge policy is already supposed to merge quite aggressively
segments that contain lots of deleted documents so it is a bit surprising
that you can see that many numbers of deleted documents, even with merge
throttling disabled.

You mention having memory pressure because of the number of documents in
your index, do you know what causes this memory pressure? In case it is due
to field data maybe you could consider storing field data on disk? (what we
call "doc values")

On Fri, Aug 22, 2014 at 5:27 AM, Jonathan Foy thefoy@gmail.com wrote:

Hello

I'm in the process of putting a two-node Elasticsearch cluster (1.1.2)
into production, but I'm having a bit of trouble keeping it stable enough
for comfort. Specifically, I'm trying to figure out the best way to keep
the number of deleted documents under control.

Both nodes are r3.xlarge EC2 instances (4 cores, 30.5 GB RAM). The ES
cluster mirrors the primary data store, a MySQL database. Relevant updates
to the database are caught via triggers which populate a table that's
monitored by an indexing process. This results in what I'd consider of lot
of reindexing, any time the primary data is updated. Search and indexing
performance thus far has been in line with expectations when the number of
deleted documents is small, but as it grows (up to 30-40%), the amount of
available RAM becomes limited, ultimately causing memory problems. If I
optimize/purge deletes then things return to normal, though I usually end
up having to restart at least one server if not both due to OOM problems
and shard failures during optimization. When ES becomes the source of all
searches for the application, I can't really afford this downtime.

What would be the preferred course of action here? I do have a window
over the weekend where I could work with somewhat reduced capacity; I was
thinking perhaps I could pull one node out of search rotation, optimize it,
swap it with the other, optimize it, and then go on my way. However, I
don't know that I CAN pull one node out of rotation (it seems like the
search API lets me specify a node, but nothing to say "Node X doesn't need
any searches"), nor does it appear that I can optimize an index on one node
without doing the same to the other.

I've tried tweaking the merge settings to favour segments containing large
numbers of deletions, but it doesn't seem to make enough of a difference.
I've also disabled merge throttling (I do have SSD-backed storage). Is
there any safe way to perform regular maintenance on the cluster,
preferably one node at a time, without causing TOO many problems? Am I
just trying to do too much with the hardware I have?

Any advice is appreciated. Let me know what info I left out that would
help.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/65b96db1-0e56-4681-b73d-c21365983199%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/65b96db1-0e56-4681-b73d-c21365983199%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j6Zfo0LZ_Zot2gaNuHMP-6iJn5qyG30kTOMr%3DkrvABkfw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Jonathan_Foy · August 23, 2014, 3:08pm

Hello

I was a bit surprised to see the number of deleted docs grow so large, but
I won't rule out my having something setup wrong. Non-default merge
settings are below, by all means let me know if I've done something stupid:

indices.store.throttle.type: none
index.merge.policy.reclaim_deletes_weight: 6.0
index.merge.policy.max_merge_at_once: 5
index.merge.policy.segments_per_tier: 5
index.merge.policy.max_merged_segment: 2gb
index.merge.scheduler.max_thread_count: 3

I make extensive use of nested documents, and to a smaller degree child
docs. Right now things are hovering around 15% deleted after a cleanup on
Wednesday. I've also cleaned up my mappings a lot since I saw the 45%
deleted number (less redundant data, broke some things off into child docs
to maintain separately), but it was up to 30% this last weekend. When I've
looked in the past when I saw the 40+% numbers, the segments in the largest
tier (2 GB) would sometimes have up to 50+% deleted docs in them, the
smaller segments all seemed pretty contained, which I guess makes sense as
they didn't stick around for nearly as long.

As for where the memory is spent, according to ElasticHQ, right now on one
server I have a 20 GB heap (out of 30.5, which I know is above the 50%
suggested, just trying to get things to work), I'm using 90% as follows:

Field cache: 5.9 GB
Filter cache: 4.0 GB (I had reduced this before the last restart, but
forgot to make the changes permanent. I do use a lot of filters though, so
would like to be able to use the cache).
ID cache: 3.5 GB

Node stats "Segments: memory_in_bytes": 6.65 GB (I'm not exactly sure how
this one contributes to the total heap number).

As for the disk-based "doc values", I don't know how I have not come across
them thus far, but that sounds quite promising. I'm a little late in the
game to be changing everything yet again, but it may be a good idea
regardless, and is definitely something I'll read more about and consider
going forward. Thank you for bringing it to my attention.

Anyway, my current plan, since I'm running in AWS and have the flexibility,
is just to add another r3.xlarge node to the cluster over the weekend, try
the deleted-doc purge, and then pull the node back out after moving all
shards off of it. I'm hoping this will allow me to clean things up with
extra horsepower, but not increase costs too much throughout the week.

Thanks for you input, it's very much appreciated.

On Friday, August 22, 2014 7:14:18 PM UTC-4, Adrien Grand wrote:

Hi Jonathan,

The default merge policy is already supposed to merge quite aggressively
segments that contain lots of deleted documents so it is a bit surprising
that you can see that many numbers of deleted documents, even with merge
throttling disabled.

You mention having memory pressure because of the number of documents in
your index, do you know what causes this memory pressure? In case it is due
to field data maybe you could consider storing field data on disk? (what we
call "doc values")

On Fri, Aug 22, 2014 at 5:27 AM, Jonathan Foy <the...@gmail.com
<javascript:>> wrote:

Hello

I'm in the process of putting a two-node Elasticsearch cluster (1.1.2)
into production, but I'm having a bit of trouble keeping it stable enough
for comfort. Specifically, I'm trying to figure out the best way to keep
the number of deleted documents under control.

Both nodes are r3.xlarge EC2 instances (4 cores, 30.5 GB RAM). The ES
cluster mirrors the primary data store, a MySQL database. Relevant updates
to the database are caught via triggers which populate a table that's
monitored by an indexing process. This results in what I'd consider of lot
of reindexing, any time the primary data is updated. Search and indexing
performance thus far has been in line with expectations when the number of
deleted documents is small, but as it grows (up to 30-40%), the amount of
available RAM becomes limited, ultimately causing memory problems. If I
optimize/purge deletes then things return to normal, though I usually end
up having to restart at least one server if not both due to OOM problems
and shard failures during optimization. When ES becomes the source of all
searches for the application, I can't really afford this downtime.

What would be the preferred course of action here? I do have a window
over the weekend where I could work with somewhat reduced capacity; I was
thinking perhaps I could pull one node out of search rotation, optimize it,
swap it with the other, optimize it, and then go on my way. However, I
don't know that I CAN pull one node out of rotation (it seems like the
search API lets me specify a node, but nothing to say "Node X doesn't need
any searches"), nor does it appear that I can optimize an index on one node
without doing the same to the other.

I've tried tweaking the merge settings to favour segments containing
large numbers of deletions, but it doesn't seem to make enough of a
difference. I've also disabled merge throttling (I do have SSD-backed
storage). Is there any safe way to perform regular maintenance on the
cluster, preferably one node at a time, without causing TOO many problems?
Am I just trying to do too much with the hardware I have?

Any advice is appreciated. Let me know what info I left out that would
help.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/65b96db1-0e56-4681-b73d-c21365983199%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/65b96db1-0e56-4681-b73d-c21365983199%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/14ba25f3-db43-4604-b78e-92a2a5213c0f%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

jpountz · August 25, 2014, 9:24am

I left some comments inline:

On Sat, Aug 23, 2014 at 5:08 PM, Jonathan Foy thefoy@gmail.com wrote:

I was a bit surprised to see the number of deleted docs grow so large, but
I won't rule out my having something setup wrong. Non-default merge
settings are below, by all means let me know if I've done something stupid:

indices.store.throttle.type: none
index.merge.policy.reclaim_deletes_weight: 6.0
index.merge.policy.max_merge_at_once: 5
index.merge.policy.segments_per_tier: 5
index.merge.policy.max_merged_segment: 2gb
index.merge.scheduler.max_thread_count: 3

These settings don't look particularly bad, but merge policy tuning is
quite hard and I tend to refrain myself from trying to modify the default
parameters. If you're interested you can have a look at

to get a sense of the challenges of a good merge policy.

I make extensive use of nested documents, and to a smaller degree child
docs. Right now things are hovering around 15% deleted after a cleanup on
Wednesday. I've also cleaned up my mappings a lot since I saw the 45%
deleted number (less redundant data, broke some things off into child docs
to maintain separately), but it was up to 30% this last weekend. When I've
looked in the past when I saw the 40+% numbers, the segments in the largest
tier (2 GB) would sometimes have up to 50+% deleted docs in them, the
smaller segments all seemed pretty contained, which I guess makes sense as
they didn't stick around for nearly as long.

As for where the memory is spent, according to ElasticHQ, right now on one
server I have a 20 GB heap (out of 30.5, which I know is above the 50%
suggested, just trying to get things to work), I'm using 90% as follows:

Field cache: 5.9 GB
Filter cache: 4.0 GB (I had reduced this before the last restart, but
forgot to make the changes permanent. I do use a lot of filters though, so
would like to be able to use the cache).
ID cache: 3.5 GB

If you need to get some memory back, you can decrease the size of your
filter cache (uncached filters happen to be quite fast already!) to eg. 1GB
in combination with opting out for caching filters in your queries
(typically term filters are cached by default although they don't really
need, you can quite safely turn caching off on them, especially if there is
no particular reason that they would be reused across queries).

Node stats "Segments: memory_in_bytes": 6.65 GB (I'm not exactly sure how
this one contributes to the total heap number).

This is the amount of memory that is used by the index itself. It mostly
loads some small data-structures in memory in order to make search fast. I
said mostly because there is one that can be quite large: the bloom filters
that are loaded to save disk seeks when doing primary-key lookups. We
recently made good improvements that make this bloom filter not necessary
anymore and in 1.4 it will be disabled by default:

github.com/elastic/elasticsearch

Disable loading of bloom filters by default

elastic:master ← mikemccand:nobloomfilters

opened 02:41PM - 22 Jul 14 UTC

mikemccand

+15 -10

This commit changes the default for index.codec.bloom.load to false, because blo…om filters can use a sizable amount of RAM on indices with many tiny documents, and now only gives smallish index-time performance gains for apps that update (not just append) documents, since we've separately improved performance for ID lookups with #6298. Closes #6349

You can already unload it by setting the index.codec.bloom.load setting
to false (it's a live setting so no need to restart or reopen the index),
note that this might however hurt indexing speed.

As for the disk-based "doc values", I don't know how I have not come
across them thus far, but that sounds quite promising. I'm a little late
in the game to be changing everything yet again, but it may be a good idea
regardless, and is definitely something I'll read more about and consider
going forward. Thank you for bringing it to my attention.

Anyway, my current plan, since I'm running in AWS and have the
flexibility, is just to add another r3.xlarge node to the cluster over the
weekend, try the deleted-doc purge, and then pull the node back out after
moving all shards off of it. I'm hoping this will allow me to clean things
up with extra horsepower, but not increase costs too much throughout the
week.

Thanks for you input, it's very much appreciated.

On Friday, August 22, 2014 7:14:18 PM UTC-4, Adrien Grand wrote:

Hi Jonathan,

The default merge policy is already supposed to merge quite aggressively
segments that contain lots of deleted documents so it is a bit surprising
that you can see that many numbers of deleted documents, even with merge
throttling disabled.

You mention having memory pressure because of the number of documents in
your index, do you know what causes this memory pressure? In case it is due
to field data maybe you could consider storing field data on disk? (what we
call "doc values")

On Fri, Aug 22, 2014 at 5:27 AM, Jonathan Foy the...@gmail.com wrote:

Hello

I'm in the process of putting a two-node Elasticsearch cluster (1.1.2)
into production, but I'm having a bit of trouble keeping it stable enough
for comfort. Specifically, I'm trying to figure out the best way to keep
the number of deleted documents under control.

Both nodes are r3.xlarge EC2 instances (4 cores, 30.5 GB RAM). The ES
cluster mirrors the primary data store, a MySQL database. Relevant updates
to the database are caught via triggers which populate a table that's
monitored by an indexing process. This results in what I'd consider of lot
of reindexing, any time the primary data is updated. Search and indexing
performance thus far has been in line with expectations when the number of
deleted documents is small, but as it grows (up to 30-40%), the amount of
available RAM becomes limited, ultimately causing memory problems. If I
optimize/purge deletes then things return to normal, though I usually end
up having to restart at least one server if not both due to OOM problems
and shard failures during optimization. When ES becomes the source of all
searches for the application, I can't really afford this downtime.

What would be the preferred course of action here? I do have a window
over the weekend where I could work with somewhat reduced capacity; I was
thinking perhaps I could pull one node out of search rotation, optimize it,
swap it with the other, optimize it, and then go on my way. However, I
don't know that I CAN pull one node out of rotation (it seems like the
search API lets me specify a node, but nothing to say "Node X doesn't need
any searches"), nor does it appear that I can optimize an index on one node
without doing the same to the other.

I've tried tweaking the merge settings to favour segments containing
large numbers of deletions, but it doesn't seem to make enough of a
difference. I've also disabled merge throttling (I do have SSD-backed
storage). Is there any safe way to perform regular maintenance on the
cluster, preferably one node at a time, without causing TOO many problems?
Am I just trying to do too much with the hardware I have?

Any advice is appreciated. Let me know what info I left out that would
help.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/65b96db1-0e56-4681-b73d-c21365983199%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/65b96db1-0e56-4681-b73d-c21365983199%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/14ba25f3-db43-4604-b78e-92a2a5213c0f%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/14ba25f3-db43-4604-b78e-92a2a5213c0f%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAL6Z4j4NT7mdDQbciDu4aSydQX3hTgSFFCWGSi-gM-DG_6wXAw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

govind201 · December 2, 2014, 10:54pm

Jonathan,
Did you find a solution to this? I've been facing pretty much the same
issue since I've added nested documents to my index - delete percentage
goes really high and an explicit optimize leads to an OOM.
Thanks.

On Saturday, August 23, 2014 8:08:32 AM UTC-7, Jonathan Foy wrote:

Hello

I was a bit surprised to see the number of deleted docs grow so large, but
I won't rule out my having something setup wrong. Non-default merge
settings are below, by all means let me know if I've done something stupid:

indices.store.throttle.type: none
index.merge.policy.reclaim_deletes_weight: 6.0
index.merge.policy.max_merge_at_once: 5
index.merge.policy.segments_per_tier: 5
index.merge.policy.max_merged_segment: 2gb
index.merge.scheduler.max_thread_count: 3

I make extensive use of nested documents, and to a smaller degree child
docs. Right now things are hovering around 15% deleted after a cleanup on
Wednesday. I've also cleaned up my mappings a lot since I saw the 45%
deleted number (less redundant data, broke some things off into child docs
to maintain separately), but it was up to 30% this last weekend. When I've
looked in the past when I saw the 40+% numbers, the segments in the largest
tier (2 GB) would sometimes have up to 50+% deleted docs in them, the
smaller segments all seemed pretty contained, which I guess makes sense as
they didn't stick around for nearly as long.

As for where the memory is spent, according to ElasticHQ, right now on one
server I have a 20 GB heap (out of 30.5, which I know is above the 50%
suggested, just trying to get things to work), I'm using 90% as follows:

Field cache: 5.9 GB
Filter cache: 4.0 GB (I had reduced this before the last restart, but
forgot to make the changes permanent. I do use a lot of filters though, so
would like to be able to use the cache).
ID cache: 3.5 GB

Node stats "Segments: memory_in_bytes": 6.65 GB (I'm not exactly sure how
this one contributes to the total heap number).

As for the disk-based "doc values", I don't know how I have not come
across them thus far, but that sounds quite promising. I'm a little late
in the game to be changing everything yet again, but it may be a good idea
regardless, and is definitely something I'll read more about and consider
going forward. Thank you for bringing it to my attention.

Anyway, my current plan, since I'm running in AWS and have the
flexibility, is just to add another r3.xlarge node to the cluster over the
weekend, try the deleted-doc purge, and then pull the node back out after
moving all shards off of it. I'm hoping this will allow me to clean things
up with extra horsepower, but not increase costs too much throughout the
week.

Thanks for you input, it's very much appreciated.

On Friday, August 22, 2014 7:14:18 PM UTC-4, Adrien Grand wrote:

Hi Jonathan,

The default merge policy is already supposed to merge quite aggressively
segments that contain lots of deleted documents so it is a bit surprising
that you can see that many numbers of deleted documents, even with merge
throttling disabled.

You mention having memory pressure because of the number of documents in
your index, do you know what causes this memory pressure? In case it is due
to field data maybe you could consider storing field data on disk? (what we
call "doc values")

On Fri, Aug 22, 2014 at 5:27 AM, Jonathan Foy the...@gmail.com wrote:

Hello

I'm in the process of putting a two-node Elasticsearch cluster (1.1.2)
into production, but I'm having a bit of trouble keeping it stable enough
for comfort. Specifically, I'm trying to figure out the best way to keep
the number of deleted documents under control.

Both nodes are r3.xlarge EC2 instances (4 cores, 30.5 GB RAM). The ES
cluster mirrors the primary data store, a MySQL database. Relevant updates
to the database are caught via triggers which populate a table that's
monitored by an indexing process. This results in what I'd consider of lot
of reindexing, any time the primary data is updated. Search and indexing
performance thus far has been in line with expectations when the number of
deleted documents is small, but as it grows (up to 30-40%), the amount of
available RAM becomes limited, ultimately causing memory problems. If I
optimize/purge deletes then things return to normal, though I usually end
up having to restart at least one server if not both due to OOM problems
and shard failures during optimization. When ES becomes the source of all
searches for the application, I can't really afford this downtime.

What would be the preferred course of action here? I do have a window
over the weekend where I could work with somewhat reduced capacity; I was
thinking perhaps I could pull one node out of search rotation, optimize it,
swap it with the other, optimize it, and then go on my way. However, I
don't know that I CAN pull one node out of rotation (it seems like the
search API lets me specify a node, but nothing to say "Node X doesn't need
any searches"), nor does it appear that I can optimize an index on one node
without doing the same to the other.

I've tried tweaking the merge settings to favour segments containing
large numbers of deletions, but it doesn't seem to make enough of a
difference. I've also disabled merge throttling (I do have SSD-backed
storage). Is there any safe way to perform regular maintenance on the
cluster, preferably one node at a time, without causing TOO many problems?
Am I just trying to do too much with the hardware I have?

Any advice is appreciated. Let me know what info I left out that would
help.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/65b96db1-0e56-4681-b73d-c21365983199%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/65b96db1-0e56-4681-b73d-c21365983199%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/9401c5ac-2751-44ae-b8f3-548472e777cd%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jonathan_Foy · December 3, 2014, 2:38am

Hello

This is something I still struggle with, though not to the degree that I
once did. I've been in production for several months now with limited
issues, though I still don't consider it to be a solved problem for myself,
as it requires regular manual maintenance.

First, I threw more hardware at it. I moved from two full time nodes to
three. This helped quite a bit, and I definitely needed it once more users
started hitting the cluster and more data was cached (I also added more
warmers once usage patterns become more clear).

Second, I've fine-tuned my sync process quite a bit to avoid unnecessary
reindexing.

Third, since I'm running this cluster on EC2 instances, I just spin up more
nodes when I need to clean things up, then drop the number after for normal
use. I had been moving to four, but now I sometimes add a fifth depending
upon shard allocation - sometimes I seem to accumulate the most active
shards on the same node, and I still run into memory issues. I also drop
the filter cache to almost nothing before I run the optimize/delete step.
For the most part, this gets me through with minimal memory issues, though
I'd be screwed if I had to do this during the day. Also, there IS overhead
to moving shards across nodes, and long query times when (I presume) the
shards become active on the new nodes and any non-warmed fields are
loaded.

So, not a perfect solution by any means, but it's working.

Which version of ES are you on? I'm still on 1.1.2, with plans to update
soon, and am very much hoping that the update will help things become more
hands-off. The bloom filter being rendered unnecessary should free memory,
plus general performance improvements, I can't remember them all offhand.
Being able to actually update the merge settings dynamically will also be a
bit help in testing various configs.

Hope something in there helps. I'm definitely open to suggestions on ways
to improve things.

On Tuesday, December 2, 2014 5:54:13 PM UTC-5, Govind Chandrasekhar wrote:

Jonathan,
Did you find a solution to this? I've been facing pretty much the same
issue since I've added nested documents to my index - delete percentage
goes really high and an explicit optimize leads to an OOM.
Thanks.

On Saturday, August 23, 2014 8:08:32 AM UTC-7, Jonathan Foy wrote:

Hello

I was a bit surprised to see the number of deleted docs grow so large,
but I won't rule out my having something setup wrong. Non-default merge
settings are below, by all means let me know if I've done something stupid:

indices.store.throttle.type: none
index.merge.policy.reclaim_deletes_weight: 6.0
index.merge.policy.max_merge_at_once: 5
index.merge.policy.segments_per_tier: 5
index.merge.policy.max_merged_segment: 2gb
index.merge.scheduler.max_thread_count: 3

I make extensive use of nested documents, and to a smaller degree child
docs. Right now things are hovering around 15% deleted after a cleanup on
Wednesday. I've also cleaned up my mappings a lot since I saw the 45%
deleted number (less redundant data, broke some things off into child docs
to maintain separately), but it was up to 30% this last weekend. When I've
looked in the past when I saw the 40+% numbers, the segments in the largest
tier (2 GB) would sometimes have up to 50+% deleted docs in them, the
smaller segments all seemed pretty contained, which I guess makes sense as
they didn't stick around for nearly as long.

As for where the memory is spent, according to ElasticHQ, right now on
one server I have a 20 GB heap (out of 30.5, which I know is above the 50%
suggested, just trying to get things to work), I'm using 90% as follows:

Field cache: 5.9 GB
Filter cache: 4.0 GB (I had reduced this before the last restart, but
forgot to make the changes permanent. I do use a lot of filters though, so
would like to be able to use the cache).
ID cache: 3.5 GB

Node stats "Segments: memory_in_bytes": 6.65 GB (I'm not exactly sure how
this one contributes to the total heap number).

As for the disk-based "doc values", I don't know how I have not come
across them thus far, but that sounds quite promising. I'm a little late
in the game to be changing everything yet again, but it may be a good idea
regardless, and is definitely something I'll read more about and consider
going forward. Thank you for bringing it to my attention.

Anyway, my current plan, since I'm running in AWS and have the
flexibility, is just to add another r3.xlarge node to the cluster over the
weekend, try the deleted-doc purge, and then pull the node back out after
moving all shards off of it. I'm hoping this will allow me to clean things
up with extra horsepower, but not increase costs too much throughout the
week.

Thanks for you input, it's very much appreciated.

On Friday, August 22, 2014 7:14:18 PM UTC-4, Adrien Grand wrote:

Hi Jonathan,

The default merge policy is already supposed to merge quite aggressively
segments that contain lots of deleted documents so it is a bit surprising
that you can see that many numbers of deleted documents, even with merge
throttling disabled.

You mention having memory pressure because of the number of documents in
your index, do you know what causes this memory pressure? In case it is due
to field data maybe you could consider storing field data on disk? (what we
call "doc values")

On Fri, Aug 22, 2014 at 5:27 AM, Jonathan Foy the...@gmail.com wrote:

Hello

I'm in the process of putting a two-node Elasticsearch cluster (1.1.2)
into production, but I'm having a bit of trouble keeping it stable enough
for comfort. Specifically, I'm trying to figure out the best way to keep
the number of deleted documents under control.

Both nodes are r3.xlarge EC2 instances (4 cores, 30.5 GB RAM). The ES
cluster mirrors the primary data store, a MySQL database. Relevant updates
to the database are caught via triggers which populate a table that's
monitored by an indexing process. This results in what I'd consider of lot
of reindexing, any time the primary data is updated. Search and indexing
performance thus far has been in line with expectations when the number of
deleted documents is small, but as it grows (up to 30-40%), the amount of
available RAM becomes limited, ultimately causing memory problems. If I
optimize/purge deletes then things return to normal, though I usually end
up having to restart at least one server if not both due to OOM problems
and shard failures during optimization. When ES becomes the source of all
searches for the application, I can't really afford this downtime.

What would be the preferred course of action here? I do have a window
over the weekend where I could work with somewhat reduced capacity; I was
thinking perhaps I could pull one node out of search rotation, optimize it,
swap it with the other, optimize it, and then go on my way. However, I
don't know that I CAN pull one node out of rotation (it seems like the
search API lets me specify a node, but nothing to say "Node X doesn't need
any searches"), nor does it appear that I can optimize an index on one node
without doing the same to the other.

I've tried tweaking the merge settings to favour segments containing
large numbers of deletions, but it doesn't seem to make enough of a
difference. I've also disabled merge throttling (I do have SSD-backed
storage). Is there any safe way to perform regular maintenance on the
cluster, preferably one node at a time, without causing TOO many problems?
Am I just trying to do too much with the hardware I have?

Any advice is appreciated. Let me know what info I left out that would
help.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/65b96db1-0e56-4681-b73d-c21365983199%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/65b96db1-0e56-4681-b73d-c21365983199%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a0573e03-08c3-4b37-9375-57de4afedf2d%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

nik9000 · December 3, 2014, 4:29am

I've had some issues with high IO exacerbated by lots of deleted docs as
well. I'd get deleted docs in the 30%-40% range on some indexes. We
attacked the problem in two ways:

Hardware. More ram and better SSDs really really helped. No consumer
grade SSDs for me.
Tweak some merge settings:
The most important setting is index.merge.policy.max_merged_segment. You
never want your settings to get near that size so set it to 30GB or
something stupid huge. The way the merge policy works segments near
max_merged_segment in size will end up with tons and tons of deletes before
they are considered for merging and even then the merge policy will still
shy away from merging them.
I raised reclaim_deletes_weight slightly (2.5 or 3 or so) and lowered
segments_per_tier and max_merge_at_once to get slightly better search
performance. These were likely less important.

I hope that helps some!

Nik

On Tue, Dec 2, 2014 at 9:38 PM, Jonathan Foy thefoy@gmail.com wrote:

Hello

This is something I still struggle with, though not to the degree that I
once did. I've been in production for several months now with limited
issues, though I still don't consider it to be a solved problem for myself,
as it requires regular manual maintenance.

First, I threw more hardware at it. I moved from two full time nodes to
three. This helped quite a bit, and I definitely needed it once more users
started hitting the cluster and more data was cached (I also added more
warmers once usage patterns become more clear).

Second, I've fine-tuned my sync process quite a bit to avoid unnecessary
reindexing.

Third, since I'm running this cluster on EC2 instances, I just spin up
more nodes when I need to clean things up, then drop the number after for
normal use. I had been moving to four, but now I sometimes add a fifth
depending upon shard allocation - sometimes I seem to accumulate the most
active shards on the same node, and I still run into memory issues. I also
drop the filter cache to almost nothing before I run the optimize/delete
step. For the most part, this gets me through with minimal memory issues,
though I'd be screwed if I had to do this during the day. Also, there IS
overhead to moving shards across nodes, and long query times when (I
presume) the shards become active on the new nodes and any non-warmed
fields are loaded.

So, not a perfect solution by any means, but it's working.

Which version of ES are you on? I'm still on 1.1.2, with plans to update
soon, and am very much hoping that the update will help things become more
hands-off. The bloom filter being rendered unnecessary should free memory,
plus general performance improvements, I can't remember them all offhand.
Being able to actually update the merge settings dynamically will also be a
bit help in testing various configs.

Hope something in there helps. I'm definitely open to suggestions on ways
to improve things.

On Tuesday, December 2, 2014 5:54:13 PM UTC-5, Govind Chandrasekhar wrote:

Jonathan,
Did you find a solution to this? I've been facing pretty much the same
issue since I've added nested documents to my index - delete percentage
goes really high and an explicit optimize leads to an OOM.
Thanks.

On Saturday, August 23, 2014 8:08:32 AM UTC-7, Jonathan Foy wrote:

Hello

I was a bit surprised to see the number of deleted docs grow so large,
but I won't rule out my having something setup wrong. Non-default merge
settings are below, by all means let me know if I've done something stupid:

indices.store.throttle.type: none
index.merge.policy.reclaim_deletes_weight: 6.0
index.merge.policy.max_merge_at_once: 5
index.merge.policy.segments_per_tier: 5
index.merge.policy.max_merged_segment: 2gb
index.merge.scheduler.max_thread_count: 3

I make extensive use of nested documents, and to a smaller degree child
docs. Right now things are hovering around 15% deleted after a cleanup on
Wednesday. I've also cleaned up my mappings a lot since I saw the 45%
deleted number (less redundant data, broke some things off into child docs
to maintain separately), but it was up to 30% this last weekend. When I've
looked in the past when I saw the 40+% numbers, the segments in the largest
tier (2 GB) would sometimes have up to 50+% deleted docs in them, the
smaller segments all seemed pretty contained, which I guess makes sense as
they didn't stick around for nearly as long.

As for where the memory is spent, according to ElasticHQ, right now on
one server I have a 20 GB heap (out of 30.5, which I know is above the 50%
suggested, just trying to get things to work), I'm using 90% as follows:

Field cache: 5.9 GB
Filter cache: 4.0 GB (I had reduced this before the last restart, but
forgot to make the changes permanent. I do use a lot of filters though, so
would like to be able to use the cache).
ID cache: 3.5 GB

Node stats "Segments: memory_in_bytes": 6.65 GB (I'm not exactly sure
how this one contributes to the total heap number).

As for the disk-based "doc values", I don't know how I have not come
across them thus far, but that sounds quite promising. I'm a little late
in the game to be changing everything yet again, but it may be a good idea
regardless, and is definitely something I'll read more about and consider
going forward. Thank you for bringing it to my attention.

Anyway, my current plan, since I'm running in AWS and have the
flexibility, is just to add another r3.xlarge node to the cluster over the
weekend, try the deleted-doc purge, and then pull the node back out after
moving all shards off of it. I'm hoping this will allow me to clean things
up with extra horsepower, but not increase costs too much throughout the
week.

Thanks for you input, it's very much appreciated.

On Friday, August 22, 2014 7:14:18 PM UTC-4, Adrien Grand wrote:

Hi Jonathan,

The default merge policy is already supposed to merge quite
aggressively segments that contain lots of deleted documents so it is a bit
surprising that you can see that many numbers of deleted documents, even
with merge throttling disabled.

You mention having memory pressure because of the number of documents
in your index, do you know what causes this memory pressure? In case it is
due to field data maybe you could consider storing field data on disk?
(what we call "doc values")

On Fri, Aug 22, 2014 at 5:27 AM, Jonathan Foy the...@gmail.com wrote:

Hello

I'm in the process of putting a two-node Elasticsearch cluster (1.1.2)
into production, but I'm having a bit of trouble keeping it stable enough
for comfort. Specifically, I'm trying to figure out the best way to keep
the number of deleted documents under control.

Both nodes are r3.xlarge EC2 instances (4 cores, 30.5 GB RAM). The ES
cluster mirrors the primary data store, a MySQL database. Relevant updates
to the database are caught via triggers which populate a table that's
monitored by an indexing process. This results in what I'd consider of lot
of reindexing, any time the primary data is updated. Search and indexing
performance thus far has been in line with expectations when the number of
deleted documents is small, but as it grows (up to 30-40%), the amount of
available RAM becomes limited, ultimately causing memory problems. If I
optimize/purge deletes then things return to normal, though I usually end
up having to restart at least one server if not both due to OOM problems
and shard failures during optimization. When ES becomes the source of all
searches for the application, I can't really afford this downtime.

What would be the preferred course of action here? I do have a window
over the weekend where I could work with somewhat reduced capacity; I was
thinking perhaps I could pull one node out of search rotation, optimize it,
swap it with the other, optimize it, and then go on my way. However, I
don't know that I CAN pull one node out of rotation (it seems like the
search API lets me specify a node, but nothing to say "Node X doesn't need
any searches"), nor does it appear that I can optimize an index on one node
without doing the same to the other.

I've tried tweaking the merge settings to favour segments containing
large numbers of deletions, but it doesn't seem to make enough of a
difference. I've also disabled merge throttling (I do have SSD-backed
storage). Is there any safe way to perform regular maintenance on the
cluster, preferably one node at a time, without causing TOO many problems?
Am I just trying to do too much with the hardware I have?

Any advice is appreciated. Let me know what info I left out that
would help.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/65b96db1-0e56-4681-b73d-c21365983199%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/65b96db1-0e56-4681-b73d-c21365983199%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a0573e03-08c3-4b37-9375-57de4afedf2d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a0573e03-08c3-4b37-9375-57de4afedf2d%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3NT9h8mwxR5YtO6DQcSf9xeWtX3TNy_EZAZjXreJ69LA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Jonathan_Foy · December 3, 2014, 1:32pm

Interesting...does the very large max_merged_segment not result in memory
issues when the largest segments are merged? When I run my the cleanup
command (_optimize?only_expunge_deletes) I see a steep spike in memor as
each merge is completing, followed by an immediate drop, presumably as the
new segment is fully initialized and then the old ones are subsequently
dropped. I'd be worried that I'd run out of memory when initializing the
larger segments. That being said, I only notice the large spikes when
merging via the explicit optimize/only_expunge_deletes command, the
continuous merging throughout the day results in very mild spikes by
comparison.

I guess I could always add a single node with the higher settings and just
drop it if it becomes problematic in order to test (since, though dynamic,
prior to 1.4 the merge settings only take effect on shard initialization if
I remember correctly).

Thanks for the advice though, I'll definitely try that.

Jonathan

On Tuesday, December 2, 2014 11:30:08 PM UTC-5, Nikolas Everett wrote:

I've had some issues with high IO exacerbated by lots of deleted docs as
well. I'd get deleted docs in the 30%-40% range on some indexes. We
attacked the problem in two ways:

Hardware. More ram and better SSDs really really helped. No consumer
grade SSDs for me.

Tweak some merge settings:
The most important setting is index.merge.policy.max_merged_segment. You
never want your settings to get near that size so set it to 30GB or
something stupid huge. The way the merge policy works segments near
max_merged_segment in size will end up with tons and tons of deletes before
they are considered for merging and even then the merge policy will still
shy away from merging them.
I raised reclaim_deletes_weight slightly (2.5 or 3 or so) and lowered
segments_per_tier and max_merge_at_once to get slightly better search
performance. These were likely less important.

I hope that helps some!

Nik

On Tue, Dec 2, 2014 at 9:38 PM, Jonathan Foy <the...@gmail.com
<javascript:>> wrote:

Hello

This is something I still struggle with, though not to the degree that I
once did. I've been in production for several months now with limited
issues, though I still don't consider it to be a solved problem for myself,
as it requires regular manual maintenance.

First, I threw more hardware at it. I moved from two full time nodes to
three. This helped quite a bit, and I definitely needed it once more users
started hitting the cluster and more data was cached (I also added more
warmers once usage patterns become more clear).

Second, I've fine-tuned my sync process quite a bit to avoid unnecessary
reindexing.

Third, since I'm running this cluster on EC2 instances, I just spin up
more nodes when I need to clean things up, then drop the number after for
normal use. I had been moving to four, but now I sometimes add a fifth
depending upon shard allocation - sometimes I seem to accumulate the most
active shards on the same node, and I still run into memory issues. I also
drop the filter cache to almost nothing before I run the optimize/delete
step. For the most part, this gets me through with minimal memory issues,
though I'd be screwed if I had to do this during the day. Also, there IS
overhead to moving shards across nodes, and long query times when (I
presume) the shards become active on the new nodes and any non-warmed
fields are loaded.

So, not a perfect solution by any means, but it's working.

Which version of ES are you on? I'm still on 1.1.2, with plans to update
soon, and am very much hoping that the update will help things become more
hands-off. The bloom filter being rendered unnecessary should free memory,
plus general performance improvements, I can't remember them all offhand.
Being able to actually update the merge settings dynamically will also be a
bit help in testing various configs.

Hope something in there helps. I'm definitely open to suggestions on
ways to improve things.

On Tuesday, December 2, 2014 5:54:13 PM UTC-5, Govind Chandrasekhar wrote:

Jonathan,
Did you find a solution to this? I've been facing pretty much the same
issue since I've added nested documents to my index - delete percentage
goes really high and an explicit optimize leads to an OOM.
Thanks.

On Saturday, August 23, 2014 8:08:32 AM UTC-7, Jonathan Foy wrote:

Hello

I was a bit surprised to see the number of deleted docs grow so large,
but I won't rule out my having something setup wrong. Non-default merge
settings are below, by all means let me know if I've done something stupid:

indices.store.throttle.type: none
index.merge.policy.reclaim_deletes_weight: 6.0
index.merge.policy.max_merge_at_once: 5
index.merge.policy.segments_per_tier: 5
index.merge.policy.max_merged_segment: 2gb
index.merge.scheduler.max_thread_count: 3

I make extensive use of nested documents, and to a smaller degree child
docs. Right now things are hovering around 15% deleted after a cleanup on
Wednesday. I've also cleaned up my mappings a lot since I saw the 45%
deleted number (less redundant data, broke some things off into child docs
to maintain separately), but it was up to 30% this last weekend. When I've
looked in the past when I saw the 40+% numbers, the segments in the largest
tier (2 GB) would sometimes have up to 50+% deleted docs in them, the
smaller segments all seemed pretty contained, which I guess makes sense as
they didn't stick around for nearly as long.

As for where the memory is spent, according to ElasticHQ, right now on
one server I have a 20 GB heap (out of 30.5, which I know is above the 50%
suggested, just trying to get things to work), I'm using 90% as follows:

Field cache: 5.9 GB
Filter cache: 4.0 GB (I had reduced this before the last restart, but
forgot to make the changes permanent. I do use a lot of filters though, so
would like to be able to use the cache).
ID cache: 3.5 GB

Node stats "Segments: memory_in_bytes": 6.65 GB (I'm not exactly sure
how this one contributes to the total heap number).

As for the disk-based "doc values", I don't know how I have not come
across them thus far, but that sounds quite promising. I'm a little late
in the game to be changing everything yet again, but it may be a good idea
regardless, and is definitely something I'll read more about and consider
going forward. Thank you for bringing it to my attention.

Anyway, my current plan, since I'm running in AWS and have the
flexibility, is just to add another r3.xlarge node to the cluster over the
weekend, try the deleted-doc purge, and then pull the node back out after
moving all shards off of it. I'm hoping this will allow me to clean things
up with extra horsepower, but not increase costs too much throughout the
week.

Thanks for you input, it's very much appreciated.

On Friday, August 22, 2014 7:14:18 PM UTC-4, Adrien Grand wrote:

Hi Jonathan,

The default merge policy is already supposed to merge quite
aggressively segments that contain lots of deleted documents so it is a bit
surprising that you can see that many numbers of deleted documents, even
with merge throttling disabled.

You mention having memory pressure because of the number of documents
in your index, do you know what causes this memory pressure? In case it is
due to field data maybe you could consider storing field data on disk?
(what we call "doc values")

On Fri, Aug 22, 2014 at 5:27 AM, Jonathan Foy the...@gmail.com
wrote:

Hello

I'm in the process of putting a two-node Elasticsearch cluster
(1.1.2) into production, but I'm having a bit of trouble keeping it stable
enough for comfort. Specifically, I'm trying to figure out the best way to
keep the number of deleted documents under control.

Both nodes are r3.xlarge EC2 instances (4 cores, 30.5 GB RAM). The
ES cluster mirrors the primary data store, a MySQL database. Relevant
updates to the database are caught via triggers which populate a table
that's monitored by an indexing process. This results in what I'd consider
of lot of reindexing, any time the primary data is updated. Search and
indexing performance thus far has been in line with expectations when the
number of deleted documents is small, but as it grows (up to 30-40%), the
amount of available RAM becomes limited, ultimately causing memory
problems. If I optimize/purge deletes then things return to normal, though
I usually end up having to restart at least one server if not both due to
OOM problems and shard failures during optimization. When ES becomes the
source of all searches for the application, I can't really afford this
downtime.

What would be the preferred course of action here? I do have a
window over the weekend where I could work with somewhat reduced capacity;
I was thinking perhaps I could pull one node out of search rotation,
optimize it, swap it with the other, optimize it, and then go on my way.
However, I don't know that I CAN pull one node out of rotation (it seems
like the search API lets me specify a node, but nothing to say "Node X
doesn't need any searches"), nor does it appear that I can optimize an
index on one node without doing the same to the other.

I've tried tweaking the merge settings to favour segments containing
large numbers of deletions, but it doesn't seem to make enough of a
difference. I've also disabled merge throttling (I do have SSD-backed
storage). Is there any safe way to perform regular maintenance on the
cluster, preferably one node at a time, without causing TOO many problems?
Am I just trying to do too much with the hardware I have?

Any advice is appreciated. Let me know what info I left out that
would help.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it,
send an email to elasticsearc...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/65b96db1-0e56-4681-b73d-c21365983199%
40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/65b96db1-0e56-4681-b73d-c21365983199%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/d/optout.

--
Adrien Grand

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a0573e03-08c3-4b37-9375-57de4afedf2d%40googlegroups.com
https://groups.google.com/d/msgid/elasticsearch/a0573e03-08c3-4b37-9375-57de4afedf2d%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/dc824142-8252-4a3f-be13-8cfac9097079%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

nik9000 · December 3, 2014, 2:29pm

On Wed, Dec 3, 2014 at 8:32 AM, Jonathan Foy thefoy@gmail.com wrote:

Interesting...does the very large max_merged_segment not result in memory

issues when the largest segments are merged? When I run my the cleanup
command (_optimize?only_expunge_deletes) I see a steep spike in memor as
each merge is completing, followed by an immediate drop, presumably as the
new segment is fully initialized and then the old ones are subsequently
dropped. I'd be worried that I'd run out of memory when initializing the
larger segments. That being said, I only notice the large spikes when
merging via the explicit optimize/only_expunge_deletes command, the
continuous merging throughout the day results in very mild spikes by
comparison.

I don't see memory issues but I'm not really looking for them. Memory
usage has never been a problem for us. IO spikes were a problem the few
times I ran only_expunge_deletes.

I'm forming the opinion that calling _optimize is should be a pretty
remarkable thing. Like it should only be required when:

You are done writing an index and will never touch it again and want to
save some space/make querying a bit faster.
You are working around some funky bug.
You've just built the index with funky merge settings that created a
bazillion segments but imported quickly.
You shouldn't be calling it. Stop now. You've made a mistake.

I think that #1 and #3 aren't valid for only_expunge_deletes though. So
that leaves either - you are working around a bug or you are making a
mistake.

In your case I think your mistake is taking the default merge settings.
Maybe. Or maybe that is a bug. I'm not sure. If it is a mistake then you
are in good company.

Also! only_expunge_deletes is kind of a trappy name - what it really does
is smash all the segments with deletes together into one big segment making
the max_merged_segment worse in the long run.

A steep spike in memory usage is probably not worth worrying about so long
as you don't see any full GCs done via stop the world (concurrent mode
failure). I'd expect to see more minor GCs during the spike and those are
stop the world but they should be pretty short. Elasticsearch should log
a WARNING or ERROR during concurrent mode failures. It also exposes
counters of all the time spent in minor and full GCs and you can jam those
into RRDtool to get some nice graphs. Marvel will probably do that for
you, I'm not sure. You can also use jstat -gcutil <pid> 1s 10000 to get
it to spit out the numbers in real time.

I guess I could always add a single node with the higher settings and just
drop it if it becomes problematic in order to test (since, though dynamic,
prior to 1.4 the merge settings only take effect on shard initialization if
I remember correctly).

I'm pretty sure that is an index level settings. Also, I think there was
an issue with applying it live in some versions but I know its fixed in
1.4. I'm pretty sure you can trick your way around the issue by moving the
shard to another node. Its kind of fun.

Thanks for the advice though, I'll definitely try that.

Good Luck!

Nik

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3LXVJ4O2pZZjz3hDH9w499GOQ85mAsk-TLp0Y3E8YC2Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

govind201 · December 3, 2014, 8:04pm

Jonathan,

Your current setup doesn't look ideal. As Nikolas pointed out, optimize
should be run under exceptional circumstances, not for regular maintenance.
That's what the merge policy setting are for, and the right settings should
meet your needs, atleast theoretically. That said, I can't say I've always
heeded this advice, since I've often resorted to using only_expunge_deletes
when things have gotten out of hand, because it's an easy remedy to a large
problem.

I'm trying out a different set of settings to those Nikolas just pointed
out. Since my issue is OOMs when merges take place, not so much I/O, I
figured the issue is with one of two things:

Too many segments are being merged concurrently.
The size of the merged segments are large.
I reduced "max_merge_at_once", but this didn't fix the issue. So it had to
be that the segments being merged were quite large. I noticed that my
largest segments often formed >50% of each shard and had upto 30% deletes,
and OOMs occurred since when these massive segments were being "merged" to
expunge deletes, since it led to the amount of data on the shard almost
doubling.

To remedy this, I've REDUCED the size of "max_merged_segment" (I can live
with more segments) and reindexed all of my data (since this doesn't help
reduced existing large segments). If I understand merge settings correctly,
this means that in the worst case scenario, the amount of memory used for
merging will be (max_marged_segment x max_merge_at_once) GB.

Since these settings don't apply retrospectively to existing large
segments, I've reindexed all of my data. All of this was done in the last
day or so, so I've yet to see how it works out, though I'm optimistic.

By the way, I believe "max_marged_segment" limits are not observed for
explicit optimize, so atleast in my setup, I'm going to have to shy away
from explicitly expunging deletes. It could be that in your case, because
of repeated explicit optimizes, or use of max_num_segments, coupled with
the fact that you have a lot of reindexing going on (that too with child
documents, since any change in any one of the child documents results in
all other child documents and the parent document being marked as deleted),
things have gotten particularly out of hand.

On 3 December 2014 at 06:29, Nikolas Everett nik9000@gmail.com wrote:

On Wed, Dec 3, 2014 at 8:32 AM, Jonathan Foy thefoy@gmail.com wrote:

Interesting...does the very large max_merged_segment not result in memory

issues when the largest segments are merged? When I run my the cleanup
command (_optimize?only_expunge_deletes) I see a steep spike in memor as
each merge is completing, followed by an immediate drop, presumably as the
new segment is fully initialized and then the old ones are subsequently
dropped. I'd be worried that I'd run out of memory when initializing the
larger segments. That being said, I only notice the large spikes when
merging via the explicit optimize/only_expunge_deletes command, the
continuous merging throughout the day results in very mild spikes by
comparison.

I don't see memory issues but I'm not really looking for them. Memory
usage has never been a problem for us. IO spikes were a problem the few
times I ran only_expunge_deletes.

I'm forming the opinion that calling _optimize is should be a pretty
remarkable thing. Like it should only be required when:

You are done writing an index and will never touch it again and want
to save some space/make querying a bit faster.

You are working around some funky bug.

You've just built the index with funky merge settings that created a
bazillion segments but imported quickly.

You shouldn't be calling it. Stop now. You've made a mistake.

I think that #1 and #3 aren't valid for only_expunge_deletes though. So
that leaves either - you are working around a bug or you are making a
mistake.

In your case I think your mistake is taking the default merge settings.
Maybe. Or maybe that is a bug. I'm not sure. If it is a mistake then you
are in good company.

Also! only_expunge_deletes is kind of a trappy name - what it really does
is smash all the segments with deletes together into one big segment making
the max_merged_segment worse in the long run.

A steep spike in memory usage is probably not worth worrying about so long
as you don't see any full GCs done via stop the world (concurrent mode
failure). I'd expect to see more minor GCs during the spike and those are
stop the world but they should be pretty short. Elasticsearch should log
a WARNING or ERROR during concurrent mode failures. It also exposes
counters of all the time spent in minor and full GCs and you can jam those
into RRDtool to get some nice graphs. Marvel will probably do that for
you, I'm not sure. You can also use jstat -gcutil <pid> 1s 10000 to get
it to spit out the numbers in real time.

I guess I could always add a single node with the higher settings and
just drop it if it becomes problematic in order to test (since, though
dynamic, prior to 1.4 the merge settings only take effect on shard
initialization if I remember correctly).

I'm pretty sure that is an index level settings. Also, I think there was
an issue with applying it live in some versions but I know its fixed in
1.4. I'm pretty sure you can trick your way around the issue by moving the
shard to another node. Its kind of fun.

Thanks for the advice though, I'll definitely try that.

Good Luck!

Nik

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/b3yEygFl2As/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3LXVJ4O2pZZjz3hDH9w499GOQ85mAsk-TLp0Y3E8YC2Q%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3LXVJ4O2pZZjz3hDH9w499GOQ85mAsk-TLp0Y3E8YC2Q%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Govind Chandrasekhar

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAN_C4Cb9t4_xfJGk2vaFh8dVT5yTqGgb5Ru1pRB_iw3QbpjMng%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Jonathan_Foy · December 4, 2014, 4:54pm

Hello

I do agree with both of you that my use of optimize as regular maintenance
isn't the correct way to do things, but it's been the only thing that I've
found that keeps the deleted doc count/memory under control. I very much
want to find something that works to avoid it.

I came to much the same conclusions that you did regarding the merge
settings and logic. It took a while (and eventually just reading the code)
to find out that though dynamic, the merge settings don't actually take
effect until a shard is moved/created (fixed in 1.4), so a lot of my early
work thinking I'd changed settings wasn't really valid. That said, my
merge settings are still largely what I have listed earlier in the thread,
though repeating them for convenience:

indices.store.throttle.type: none
index.merge.policy.reclaim_deletes_weight: 6.0 <-- This one I know is
quite high, I kept bumping it up before I realized the changes weren't
taking effect immediately
index.merge.policy.max_merge_at_once: 5
index.merge.policy.max_merge_at_once_explicit: 5
index.merge.policy.segments_per_tier: 5
index.merge.policy.max_merged_segment: 2gb

I DO have a mess of nested documents in the type that I know is the most
troublesome...perhaps the merge logic doesn't take deleted nested documents
into account when deciding what segment to merge? Or perhaps since I have
a small max_merged_segment, it's like Nikolas said and those max sized
segments are just rarely reclaimed in normal operation, and so the deleted
doc count (and the memory they take up) grows. I don't have memory issues
during normal merge operations, so I think I may start testing with a
larger max segment size.

I'll let you know if I ever get it resolved.

On Wednesday, December 3, 2014 3:05:18 PM UTC-5, Govind Chandrasekhar wrote:

Jonathan,

Your current setup doesn't look ideal. As Nikolas pointed out, optimize
should be run under exceptional circumstances, not for regular maintenance.
That's what the merge policy setting are for, and the right settings should
meet your needs, atleast theoretically. That said, I can't say I've always
heeded this advice, since I've often resorted to using only_expunge_deletes
when things have gotten out of hand, because it's an easy remedy to a large
problem.

I'm trying out a different set of settings to those Nikolas just pointed
out. Since my issue is OOMs when merges take place, not so much I/O, I
figured the issue is with one of two things:

Too many segments are being merged concurrently.

The size of the merged segments are large.
I reduced "max_merge_at_once", but this didn't fix the issue. So it had to
be that the segments being merged were quite large. I noticed that my
largest segments often formed >50% of each shard and had upto 30% deletes,
and OOMs occurred since when these massive segments were being "merged" to
expunge deletes, since it led to the amount of data on the shard almost
doubling.

To remedy this, I've REDUCED the size of "max_merged_segment" (I can live
with more segments) and reindexed all of my data (since this doesn't help
reduced existing large segments). If I understand merge settings correctly,
this means that in the worst case scenario, the amount of memory used for
merging will be (max_marged_segment x max_merge_at_once) GB.

Since these settings don't apply retrospectively to existing large
segments, I've reindexed all of my data. All of this was done in the last
day or so, so I've yet to see how it works out, though I'm optimistic.

By the way, I believe "max_marged_segment" limits are not observed for
explicit optimize, so atleast in my setup, I'm going to have to shy away
from explicitly expunging deletes. It could be that in your case, because
of repeated explicit optimizes, or use of max_num_segments, coupled with
the fact that you have a lot of reindexing going on (that too with child
documents, since any change in any one of the child documents results in
all other child documents and the parent document being marked as deleted),
things have gotten particularly out of hand.

On 3 December 2014 at 06:29, Nikolas Everett <nik...@gmail.com
<javascript:>> wrote:

On Wed, Dec 3, 2014 at 8:32 AM, Jonathan Foy <the...@gmail.com
<javascript:>> wrote:

Interesting...does the very large max_merged_segment not result in memory

issues when the largest segments are merged? When I run my the cleanup
command (_optimize?only_expunge_deletes) I see a steep spike in memor as
each merge is completing, followed by an immediate drop, presumably as the
new segment is fully initialized and then the old ones are subsequently
dropped. I'd be worried that I'd run out of memory when initializing the
larger segments. That being said, I only notice the large spikes when
merging via the explicit optimize/only_expunge_deletes command, the
continuous merging throughout the day results in very mild spikes by
comparison.

I don't see memory issues but I'm not really looking for them. Memory
usage has never been a problem for us. IO spikes were a problem the few
times I ran only_expunge_deletes.

I'm forming the opinion that calling _optimize is should be a pretty
remarkable thing. Like it should only be required when:

You are done writing an index and will never touch it again and want
to save some space/make querying a bit faster.

You are working around some funky bug.

You've just built the index with funky merge settings that created a
bazillion segments but imported quickly.

You shouldn't be calling it. Stop now. You've made a mistake.

I think that #1 and #3 aren't valid for only_expunge_deletes though. So
that leaves either - you are working around a bug or you are making a
mistake.

In your case I think your mistake is taking the default merge
settings. Maybe. Or maybe that is a bug. I'm not sure. If it is a
mistake then you are in good company.

Also! only_expunge_deletes is kind of a trappy name - what it really
does is smash all the segments with deletes together into one big segment
making the max_merged_segment worse in the long run.

A steep spike in memory usage is probably not worth worrying about so
long as you don't see any full GCs done via stop the world (concurrent mode
failure). I'd expect to see more minor GCs during the spike and those are
stop the world but they should be pretty short. Elasticsearch should log
a WARNING or ERROR during concurrent mode failures. It also exposes
counters of all the time spent in minor and full GCs and you can jam those
into RRDtool to get some nice graphs. Marvel will probably do that for
you, I'm not sure. You can also use jstat -gcutil <pid> 1s 10000 to get
it to spit out the numbers in real time.

I guess I could always add a single node with the higher settings and
just drop it if it becomes problematic in order to test (since, though
dynamic, prior to 1.4 the merge settings only take effect on shard
initialization if I remember correctly).

I'm pretty sure that is an index level settings. Also, I think there was
an issue with applying it live in some versions but I know its fixed in
1.4. I'm pretty sure you can trick your way around the issue by moving the
shard to another node. Its kind of fun.

Thanks for the advice though, I'll definitely try that.

Good Luck!

Nik

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/b3yEygFl2As/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3LXVJ4O2pZZjz3hDH9w499GOQ85mAsk-TLp0Y3E8YC2Q%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3LXVJ4O2pZZjz3hDH9w499GOQ85mAsk-TLp0Y3E8YC2Q%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Govind Chandrasekhar

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0b54f0c1-937f-40f7-ab02-ad3b8ce2c2b5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Jonathan_Foy · December 20, 2014, 6:44pm

I thought I should revisit this thread in case anyone else is repeating my
mistakes, which it turns out are multiple. On the bright side, I do seem
to have resolved my issues.

tl/dr, optimize was screwing me up, and the merge settings I thought I had
in place were not actually there/active. Once applied all is well.

First, the regular use of optimize?only_expunge_deletes. I did not realize
at first that this command would in fact ignore the max_merged_segment
parameter (I thought I had checked it at one point, but I must not have).
While max_merged_segment was set to 2 GB, I ended up with segments as large
as 17 GB. I reindexed everything one weekend to observe merge behaviour
better and clear these out, and it wasn't until those segments were almost
completely full of deleted docs that they were merged out (they finally
vanished overnight, so I'm not exactly sure what the tipping point was, but
I do know they were at around 4/5 deleted at one point). Clearly my use of
optimize was putting the system in a state that only additional optimize
calls could clean, making the cluster "addicted" to the optimize call.

Second, and this is the more embarrassing thing, my changed merge settings
had mostly not taken effect (or were reverted at some point). After
removing all of the large segments via a full reindex, I added nodes to get
the system to a stable point where normal merging would keep the deleted
docs in check. It ended up taking 5/6 nodes to maintain ~30% delete
equilibrium and enough memory to operate, which was 2-3 more nodes that I
really wanted to dedicate. I decided then to bump the max_merged_segment
up as per Nikolas's recommendation above (just returning it to the default
5 GB to start with), but noticed that the index merge settings were not
what I thought they were. Sometime, probably months ago when I was trying
to tune things originally, I apparently made a mistake, though I'm still
not exactly sure when/where. I had the settings defined in the
elasticsearch.yml file, but I'm guessing those are only applied to new
indices when they're created, not existing indices that already have their
configuration set? I know I had updated some settings via the API at some
point, but perhaps I had reverted them, or simply not applied them to the
index in question. Regardless, the offending index still had mostly
default settings, only the max_merged_segment being different at 2 GB.

I applied the settings above (plus the 5 GB max_merged_segment value) to
the cluster and then performed a rolling restart to let the settings take
effect. As each node came up, the deleted docs were quickly merged out of
existence and the node stabilized ~3% deleted. CPU spiked to 100% while
this took place, disk didn't seem to be too stressed (it reported 25%
utilization when I checked via iostat at one point), but once the initial
clean-up was done things settled down, and I'm expecting smaller spikes as
it maintains the lower deleted percentage (I may even back down the
reclaim_deletes_weight). I need to see how it actually behaves during
normal load during the week before deciding everything is completely
resolved, but so far things look good, and I've been able to back down to
only 3 nodes again.

So, I've probably wasted dozens of hours a hundreds of dollars of server
time resolving what was ultimately a self-inflicted problem that should
have been fixed easily months ago. So it goes.

On Thursday, December 4, 2014 11:54:07 AM UTC-5, Jonathan Foy wrote:

Hello

I do agree with both of you that my use of optimize as regular maintenance
isn't the correct way to do things, but it's been the only thing that I've
found that keeps the deleted doc count/memory under control. I very much
want to find something that works to avoid it.

I came to much the same conclusions that you did regarding the merge
settings and logic. It took a while (and eventually just reading the code)
to find out that though dynamic, the merge settings don't actually take
effect until a shard is moved/created (fixed in 1.4), so a lot of my early
work thinking I'd changed settings wasn't really valid. That said, my
merge settings are still largely what I have listed earlier in the thread,
though repeating them for convenience:

indices.store.throttle.type: none
index.merge.policy.reclaim_deletes_weight: 6.0 <-- This one I know is
quite high, I kept bumping it up before I realized the changes weren't
taking effect immediately
index.merge.policy.max_merge_at_once: 5
index.merge.policy.max_merge_at_once_explicit: 5
index.merge.policy.segments_per_tier: 5
index.merge.policy.max_merged_segment: 2gb

I DO have a mess of nested documents in the type that I know is the most
troublesome...perhaps the merge logic doesn't take deleted nested documents
into account when deciding what segment to merge? Or perhaps since I have
a small max_merged_segment, it's like Nikolas said and those max sized
segments are just rarely reclaimed in normal operation, and so the deleted
doc count (and the memory they take up) grows. I don't have memory issues
during normal merge operations, so I think I may start testing with a
larger max segment size.

I'll let you know if I ever get it resolved.

On Wednesday, December 3, 2014 3:05:18 PM UTC-5, Govind Chandrasekhar
wrote:

Jonathan,

Your current setup doesn't look ideal. As Nikolas pointed out, optimize
should be run under exceptional circumstances, not for regular maintenance.
That's what the merge policy setting are for, and the right settings should
meet your needs, atleast theoretically. That said, I can't say I've always
heeded this advice, since I've often resorted to using only_expunge_deletes
when things have gotten out of hand, because it's an easy remedy to a large
problem.

I'm trying out a different set of settings to those Nikolas just pointed
out. Since my issue is OOMs when merges take place, not so much I/O, I
figured the issue is with one of two things:

Too many segments are being merged concurrently.

The size of the merged segments are large.
I reduced "max_merge_at_once", but this didn't fix the issue. So it had
to be that the segments being merged were quite large. I noticed that my
largest segments often formed >50% of each shard and had upto 30% deletes,
and OOMs occurred since when these massive segments were being "merged" to
expunge deletes, since it led to the amount of data on the shard almost
doubling.

To remedy this, I've REDUCED the size of "max_merged_segment" (I can live
with more segments) and reindexed all of my data (since this doesn't help
reduced existing large segments). If I understand merge settings correctly,
this means that in the worst case scenario, the amount of memory used for
merging will be (max_marged_segment x max_merge_at_once) GB.

Since these settings don't apply retrospectively to existing large
segments, I've reindexed all of my data. All of this was done in the last
day or so, so I've yet to see how it works out, though I'm optimistic.

By the way, I believe "max_marged_segment" limits are not observed for
explicit optimize, so atleast in my setup, I'm going to have to shy away
from explicitly expunging deletes. It could be that in your case, because
of repeated explicit optimizes, or use of max_num_segments, coupled with
the fact that you have a lot of reindexing going on (that too with child
documents, since any change in any one of the child documents results in
all other child documents and the parent document being marked as deleted),
things have gotten particularly out of hand.

On 3 December 2014 at 06:29, Nikolas Everett nik...@gmail.com wrote:

On Wed, Dec 3, 2014 at 8:32 AM, Jonathan Foy the...@gmail.com wrote:

Interesting...does the very large max_merged_segment not result in

memory issues when the largest segments are merged? When I run my the
cleanup command (_optimize?only_expunge_deletes) I see a steep spike in
memor as each merge is completing, followed by an immediate drop,
presumably as the new segment is fully initialized and then the old ones
are subsequently dropped. I'd be worried that I'd run out of memory when
initializing the larger segments. That being said, I only notice the large
spikes when merging via the explicit optimize/only_expunge_deletes command,
the continuous merging throughout the day results in very mild spikes by
comparison.

I don't see memory issues but I'm not really looking for them. Memory
usage has never been a problem for us. IO spikes were a problem the few
times I ran only_expunge_deletes.

I'm forming the opinion that calling _optimize is should be a pretty
remarkable thing. Like it should only be required when:

You are done writing an index and will never touch it again and want
to save some space/make querying a bit faster.

You are working around some funky bug.

You've just built the index with funky merge settings that created a
bazillion segments but imported quickly.

You shouldn't be calling it. Stop now. You've made a mistake.

I think that #1 and #3 aren't valid for only_expunge_deletes though. So
that leaves either - you are working around a bug or you are making a
mistake.

In your case I think your mistake is taking the default merge
settings. Maybe. Or maybe that is a bug. I'm not sure. If it is a
mistake then you are in good company.

Also! only_expunge_deletes is kind of a trappy name - what it really
does is smash all the segments with deletes together into one big segment
making the max_merged_segment worse in the long run.

A steep spike in memory usage is probably not worth worrying about so
long as you don't see any full GCs done via stop the world (concurrent mode
failure). I'd expect to see more minor GCs during the spike and those are
stop the world but they should be pretty short. Elasticsearch should log
a WARNING or ERROR during concurrent mode failures. It also exposes
counters of all the time spent in minor and full GCs and you can jam those
into RRDtool to get some nice graphs. Marvel will probably do that for
you, I'm not sure. You can also use jstat -gcutil <pid> 1s 10000 to get
it to spit out the numbers in real time.

I guess I could always add a single node with the higher settings and
just drop it if it becomes problematic in order to test (since, though
dynamic, prior to 1.4 the merge settings only take effect on shard
initialization if I remember correctly).

I'm pretty sure that is an index level settings. Also, I think there
was an issue with applying it live in some versions but I know its fixed in
1.4. I'm pretty sure you can trick your way around the issue by moving the
shard to another node. Its kind of fun.

Thanks for the advice though, I'll definitely try that.

Good Luck!

Nik

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/b3yEygFl2As/unsubscribe.
To unsubscribe from this group and all its topics, send an email to
elasticsearc...@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3LXVJ4O2pZZjz3hDH9w499GOQ85mAsk-TLp0Y3E8YC2Q%40mail.gmail.com
https://groups.google.com/d/msgid/elasticsearch/CAPmjWd3LXVJ4O2pZZjz3hDH9w499GOQ85mAsk-TLp0Y3E8YC2Q%40mail.gmail.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/d/optout.

--
Govind Chandrasekhar

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/095957e9-5fa5-43f5-824e-fe0c65b2640a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Topic		Replies	Views
Dealing with high number of deleted documents Elasticsearch	9	713	December 23, 2023
How to tune merging to get rid of deleted documents Elasticsearch	1	458	January 24, 2018
ElasticSearch 2.3 Purge of deleted documents Elasticsearch docker	15	858	April 1, 2022
Dealing with deleted documents Elasticsearch	4	2037	July 6, 2017
High Amount of Document Deletes on Elastic Search Version Upgrade Elasticsearch	0	131	May 17, 2024

Sustainable way to regularly purge deleted docs

Related topics