Index get bigger when indexing or updating

mateusd24 · August 12, 2020, 6:11pm

Hi people!

When i make index/update for many document the elasticsearch index get so much bigger and took 30minutes to get small if no operations is made.

some _settings that i use:
"number_of_replicas" : 0
"requests.cache.enable": false

The index originally have +-250GB, but when indexing/updating it get up to 500GB, 700GB, and my machine have only 900gb of total space. I use only one node
Screenshot from 2020-08-07 14-44-59

Screenshot from 2020-08-12 15-07-11

My questions are:
1 - why it get so bigger?
2 - how can i reduce the size back to the normal faster?
3 - have any specific settings for help with that?
4 - is possible to avoid it?

Appreciate any help,
Thanks so much!

Christian_Dahlqvist · August 12, 2020, 6:24pm

What type of storage are you using? Local SSDs? If not it is possible that merging is falling behind resulting in a temporarily enlarged index. It could also be that the updates takes a while to make large shards av as liable for merging which will also increase size.

mateusd24 · August 12, 2020, 6:48pm

Hi Christian!

I use SSD NVMe, and 30gb RAM for ES. It is a dedicated server. Using 12 shards for the index

spdoes · August 14, 2020, 3:50pm

This is a huge problem for us because we have 900GB for a 250GB index. If we go over 900GB everything will crash...

any solutions for that ?

Steve_Mushero · August 15, 2020, 2:09am

With NVMe you'd think it'd be fast.

What version is this?
Single node?
How do you know the index is 250GB?
You might get a segment list and see what it's doing, how big they are, and if they are merging (you have to use API as not in Kibana I think)
Did you try a force merge (stopping indexing first)? There are some weird rules on this, but should reduce size if it can merge (and you can see results tih segment list).
How much are you updating, like 1% or much more, as that can really bloat the storage until merge.

Segment list: GET /_cat/segments?v&s=index,size:desc

warkolm · August 16, 2020, 9:51pm

I think you need to create your own thread please @spdoes

mateusd24 · August 17, 2020, 2:32pm

Yes, the SSD is very fast.

"version" : "7.8.0"

yes, single node

when i make a reindex it get near 190GB, bust most of time (when get stable after updating) it get near 250gb

GET /_cat/segments?v&s=index,size:desc : GET /_cat/segments?v&s=index,size:desc · GitHub
Screenshot from 2020-08-17 11-20-181184×650 102 KB
according to that image, the segment count is merging because is not increasing forever when updating the docs

no, i have never tried it, i read about it Force merge API | Elasticsearch Guide [8.11] | Elastic and it is interesting and
i found that config Merge | Elasticsearch Guide [8.11] | Elastic i will try to test it

updating some fields of almost every document / 100%, now i undertand better about segments and merging, i will try to handle index.merge.scheduler.max_thread_count to see if works, if works i will write here

Steve_Mushero · August 18, 2020, 7:01am

Updating almost every document will definitely bloat your system - I think it's fair to say most people update none or nearly none of them and 100%, especially if you do it repeatedly (like one field now, then another field updated laster), is pretty unusual and will really bloat the data, as you see - then you're in a running IO battle with the merge system that kinda never ends.

Not sure if folks have advice for that, other than aggressive merging, but I think that's a lot of IO and can only go so fast, especially if you are often updating.

Looks like that setting of threads is PER SHARD, which means it can do a lot of merging if the IO can keep up but don't overload your node's IO/CPU.

Steve_Mushero · August 18, 2020, 7:17am

Note there is an un/semi-documented setting for dynamic auto throttling - since you have an unusual use case, you might turn this off as it can throttle merges after sudden updates, allowing bloating - this is not in docs that I can see, so be careful with it (though the thread limit will keep it from running away, though that's per shard, so you could overload the CPU).

index.merge.scheduler.auto_throttle:

 If this is true (the default), then the merge scheduler will rate-limit IO
 (writes) for merges to an adaptive value depending on how many merges are
 requested over time.  An application with a low indexing rate that
 unluckily suddenly requires a large merge will see that merge aggressively
 throttled, while an application doing heavy indexing will see the throttle
 move higher to allow merges to keep up with ongoing indexing.

There is also a completely undocumented setting index.merge.scheduler.max_merge_count that is the above thread count + 5 which presumably is a global limit, so if you have 4 CPUs, this is 9 - don't suggest you play with this, but be aware of it - it will globally limit merge threads to 4+5 or 9 even if you have 24 CPUs; anyway, you have weird use case, so FYI.

See the code.

spdoes · August 18, 2020, 2:52pm

Steve thanks a lot for the clear and deep answer ; ) ,

What do you think about this solution:
Stopping the update processing whenever the index is bloated (2x times its original size), then performing a _forcemerge after, waiting for the bloat to reduce and then going back to the update process ? and repeat this solution as many times is needed

my worry is:

The documentation for forcemerge says "it can cause very large segments to remain in the index which can result in increased disk usage and worse search performance." However this is only WHILE the process of forcemerge is running, right ? I need to be sure of this.

about the auto_throttle:

Mateus and I tried setting index.merge.scheduler.auto_throttle to false, however we didn´t see any difference in reducing the bloating/index increased size, we didn´t see any different either for CPU and IO. Is there anything else you could suggest ? Or could be we made some mistake in the process ?

maybe refresh_interval could be a solution:

Changing the refresh_interval is something you see as a solution ? we changed to 10s and didn´t see difference, worth increasing further ?

Steve_Mushero · August 19, 2020, 12:02pm

For refresh_interval, I'd certainly set that longer, like 60s as that'll cut down on all the activity going on and maybe allow more IO, etc. for merging.

For Force Merges, yes, you need to stop writes first, and yes, I'd expect you periodically doing that would help a lot, though you'll have to test how long it takes vs. how many updates you've done, etc. No idea the dynamics of lots of updates which might require merging all the existing segments, which is kinda unusual.

spdoes · August 19, 2020, 11:26pm

we will try with 60s refresh interval and the forcemerge process
tks steve!

Steve_Mushero · August 20, 2020, 5:41am

Please report your findings as it's an interesting case - and also how much you really update, e.g. update every doc every day, or update multiple times, etc. as it's a good study of heavy updates and related issues; not commonly seen, I think. Also given segment loads and merges, interesting to see how long force merges take, their sizes, etc.

spdoes · August 20, 2020, 11:46pm

Unfortunately those efforts did not work, the index still keeps getting 2-3x larger during the updates processes

mateusd24 · August 20, 2020, 11:51pm

as a report of the last hour and last week, we have that usages of the index and the node:

index last 1 hour

Screenshot from 2020-08-20 20-42-531195×684 93.9 KB
node last 1 hour

Screenshot from 2020-08-20 20-42-391178×851 151 KB

============================

============================

index last 1 week

Screenshot from 2020-08-20 20-48-331181×675 111 KB
node last 1 week

Screenshot from 2020-08-20 20-48-271188×852 146 KB

spdoes · August 25, 2020, 11:15pm

I guess my final conclusion is if our disk start to get filled too fast we have to stop our update processes and wait for merge to occurs. The problem is that this merge can happen 30 minutes or 4 hours after we stop the update processes, we don´t know and I think we don´t have much control. Forcemerge does not work well in this situation, I hope ES team releases a method to better deal with this in the next releases.

Steve_Mushero · August 26, 2020, 8:00am

Why is force merge bad, if you can stop the update process? Merging is always happening, though I'd think the question is how fast, and how to make it more aggressive - I'd think the above settings would help that and you can watch in the queues how many merge threads are running, tryin got increase that (maybe increase the queue if you can; not sure).

mateusd24 · August 26, 2020, 2:40pm

If i stop the update process and do forcemerge, if for example the index have 354GB (original size is 250GB) and i forcemerge it, the size go to near 350GB, the process took less than 10 seconds to happen (cpu/io have high usage during that time),
if i try again, the size stuck on 350GB (cpu/io does not get high) and only after some minuter or hours (cpu/io activity do not get high equal the first time i done forcemerge during that "sleep ES auto-merge" time) the index have some seconds of HIGH cpu/io usage and it merge.
The full merge (to get back to 250GB) happen in a random time, in a point of time, because when it happens is so fast and consume a lot of io for some seconds, but take too many time to start (and forcemerge do not worked to reduce the size back to 250GB), follow the image of the random automerge:
Screenshot from 2020-08-26 11-37-131191×726 99.6 KB
Screenshot from 2020-08-26 11-37-011193×721 70.9 KB

Steve_Mushero · August 27, 2020, 9:03am

Hmm, I have no idea though there are some cases where it won't merge - Seems odd to grow from 250G to 345G during updates/deletes, then force merge not push it back down. But then later it goes back down on its own via normal merges.

Ah, there is a note about index.merge.policy.expunge_deletes_allowed - defaults to 10% so if less than 10% deletes, the force merge won't do anything; this may be why your 2nd force doesn't work.

Have you tried various segment count options in force merge, via max_num_segments?

And also tried only_expunge_deletes, which is really what you are trying to do, as it will just replace existing segments with ones without the old docs; seems ideal for you.

Note also, this key warning in the Force Merge Docs, though this seems to be the reverse of what you are seeing; as they say writing to a large segment will then skip it in auto merge, but you are seeing auto merge work where manual does not; regardless, this issue may affect you as it won't auto merge until you update most docs.

Force merge should only be called against an index after you have finished writing to it. Force merge can cause very large (>5GB) segments to be produced, and if you continue to write to such an index then the automatic merge policy will never consider these segments for future merges until they mostly consist of deleted documents. This can cause very large segments to remain in the index which can result in increased disk usage and worse search performance.

mateusd24 · August 27, 2020, 9:49pm

Thank you so much for that ways to fix it.

I will test it and after i will be back with results.

Topic		Replies	Views
Shards getting bigger with updates (same number of documents) Elasticsearch	14	562	March 16, 2021
I have a problem help me Elasticsearch	5	361	March 3, 2022
Index significantly larger after reindexing Elasticsearch	6	1614	July 5, 2017
Index size variation when updating Elasticsearch	1	398	November 25, 2019
Size of index is increasing abnormally? Elasticsearch	13	5173	July 5, 2017

Index get bigger when indexing or updating

Related topics