Index get bigger when indexing or updating

Hi people!

When i make index/update for many document the elasticsearch index get so much bigger and took 30minutes to get small if no operations is made.

some _settings that i use:
"number_of_replicas" : 0
"requests.cache.enable": false

The index originally have +-250GB, but when indexing/updating it get up to 500GB, 700GB, and my machine have only 900gb of total space. I use only one node
Screenshot from 2020-08-07 14-44-59

Screenshot from 2020-08-12 15-07-11

My questions are:
1 - why it get so bigger?
2 - how can i reduce the size back to the normal faster?
3 - have any specific settings for help with that?
4 - is possible to avoid it?

Appreciate any help,
Thanks so much!

What type of storage are you using? Local SSDs? If not it is possible that merging is falling behind resulting in a temporarily enlarged index. It could also be that the updates takes a while to make large shards av as liable for merging which will also increase size.

1 Like

Hi Christian!

I use SSD NVMe, and 30gb RAM for ES. It is a dedicated server. Using 12 shards for the index

This is a huge problem for us because we have 900GB for a 250GB index. If we go over 900GB everything will crash...

any solutions for that ?

1 Like

With NVMe you'd think it'd be fast.

  • What version is this?
  • Single node?
  • How do you know the index is 250GB?
  • You might get a segment list and see what it's doing, how big they are, and if they are merging (you have to use API as not in Kibana I think)
  • Did you try a force merge (stopping indexing first)? There are some weird rules on this, but should reduce size if it can merge (and you can see results tih segment list).
  • How much are you updating, like 1% or much more, as that can really bloat the storage until merge.

Segment list: GET /_cat/segments?v&s=index,size:desc

1 Like

I think you need to create your own thread please @spdoes

Yes, the SSD is very fast.

  • "version" : "7.8.0"
  • yes, single node
  • when i make a reindex it get near 190GB, bust most of time (when get stable after updating) it get near 250gb
  • Screenshot from 2020-08-17 11-13-17
  • Screenshot from 2020-08-17 11-14-24
  • updating some fields of almost every document / 100%, now i undertand better about segments and merging, i will try to handle index.merge.scheduler.max_thread_count to see if works, if works i will write here

Updating almost every document will definitely bloat your system - I think it's fair to say most people update none or nearly none of them and 100%, especially if you do it repeatedly (like one field now, then another field updated laster), is pretty unusual and will really bloat the data, as you see - then you're in a running IO battle with the merge system that kinda never ends.

Not sure if folks have advice for that, other than aggressive merging, but I think that's a lot of IO and can only go so fast, especially if you are often updating.

Looks like that setting of threads is PER SHARD, which means it can do a lot of merging if the IO can keep up but don't overload your node's IO/CPU.

1 Like

Note there is an un/semi-documented setting for dynamic auto throttling - since you have an unusual use case, you might turn this off as it can throttle merges after sudden updates, allowing bloating - this is not in docs that I can see, so be careful with it (though the thread limit will keep it from running away, though that's per shard, so you could overload the CPU).

index.merge.scheduler.auto_throttle:

 If this is true (the default), then the merge scheduler will rate-limit IO
 (writes) for merges to an adaptive value depending on how many merges are
 requested over time.  An application with a low indexing rate that
 unluckily suddenly requires a large merge will see that merge aggressively
 throttled, while an application doing heavy indexing will see the throttle
 move higher to allow merges to keep up with ongoing indexing.

There is also a completely undocumented setting index.merge.scheduler.max_merge_count that is the above thread count + 5 which presumably is a global limit, so if you have 4 CPUs, this is 9 - don't suggest you play with this, but be aware of it - it will globally limit merge threads to 4+5 or 9 even if you have 24 CPUs; anyway, you have weird use case, so FYI.

See the code.

1 Like

Steve thanks a lot for the clear and deep answer ; ) ,

  1. What do you think about this solution:
    Stopping the update processing whenever the index is bloated (2x times its original size), then performing a _forcemerge after, waiting for the bloat to reduce and then going back to the update process ? and repeat this solution as many times is needed

my worry is:

  1. The documentation for forcemerge says "it can cause very large segments to remain in the index which can result in increased disk usage and worse search performance." However this is only WHILE the process of forcemerge is running, right ? I need to be sure of this.

about the auto_throttle:

  1. Mateus and I tried setting index.merge.scheduler.auto_throttle to false, however we didn´t see any difference in reducing the bloating/index increased size, we didn´t see any different either for CPU and IO. Is there anything else you could suggest ? Or could be we made some mistake in the process ?

maybe refresh_interval could be a solution:

  1. Changing the refresh_interval is something you see as a solution ? we changed to 10s and didn´t see difference, worth increasing further ?
1 Like

For refresh_interval, I'd certainly set that longer, like 60s as that'll cut down on all the activity going on and maybe allow more IO, etc. for merging.

For Force Merges, yes, you need to stop writes first, and yes, I'd expect you periodically doing that would help a lot, though you'll have to test how long it takes vs. how many updates you've done, etc. No idea the dynamics of lots of updates which might require merging all the existing segments, which is kinda unusual.

1 Like

we will try with 60s refresh interval and the forcemerge process
tks steve! :smile:

1 Like

Please report your findings as it's an interesting case - and also how much you really update, e.g. update every doc every day, or update multiple times, etc. as it's a good study of heavy updates and related issues; not commonly seen, I think. Also given segment loads and merges, interesting to see how long force merges take, their sizes, etc.

1 Like

Unfortunately those efforts did not work, the index still keeps getting 2-3x larger during the updates processes

1 Like

as a report of the last hour and last week, we have that usages of the index and the node:

============================

I guess my final conclusion is if our disk start to get filled too fast we have to stop our update processes and wait for merge to occurs. The problem is that this merge can happen 30 minutes or 4 hours after we stop the update processes, we don´t know and I think we don´t have much control. Forcemerge does not work well in this situation, I hope ES team releases a method to better deal with this in the next releases.

1 Like

Why is force merge bad, if you can stop the update process? Merging is always happening, though I'd think the question is how fast, and how to make it more aggressive - I'd think the above settings would help that and you can watch in the queues how many merge threads are running, tryin got increase that (maybe increase the queue if you can; not sure).

  • If i stop the update process and do forcemerge, if for example the index have 354GB (original size is 250GB) and i forcemerge it, the size go to near 350GB, the process took less than 10 seconds to happen (cpu/io have high usage during that time),

  • if i try again, the size stuck on 350GB (cpu/io does not get high) and only after some minuter or hours (cpu/io activity do not get high equal the first time i done forcemerge during that "sleep ES auto-merge" time) the index have some seconds of HIGH cpu/io usage and it merge.

  • The full merge (to get back to 250GB) happen in a random time, in a point of time, because when it happens is so fast and consume a lot of io for some seconds, but take too many time to start (and forcemerge do not worked to reduce the size back to 250GB), follow the image of the random automerge:

Hmm, I have no idea though there are some cases where it won't merge - Seems odd to grow from 250G to 345G during updates/deletes, then force merge not push it back down. But then later it goes back down on its own via normal merges.

Ah, there is a note about index.merge.policy.expunge_deletes_allowed - defaults to 10% so if less than 10% deletes, the force merge won't do anything; this may be why your 2nd force doesn't work.

Have you tried various segment count options in force merge, via max_num_segments?

And also tried only_expunge_deletes, which is really what you are trying to do, as it will just replace existing segments with ones without the old docs; seems ideal for you.

Note also, this key warning in the Force Merge Docs, though this seems to be the reverse of what you are seeing; as they say writing to a large segment will then skip it in auto merge, but you are seeing auto merge work where manual does not; regardless, this issue may affect you as it won't auto merge until you update most docs.

Force merge should only be called against an index after you have finished writing to it. Force merge can cause very large (>5GB) segments to be produced, and if you continue to write to such an index then the automatic merge policy will never consider these segments for future merges until they mostly consist of deleted documents. This can cause very large segments to remain in the index which can result in increased disk usage and worse search performance.

Thank you so much for that ways to fix it.

I will test it and after i will be back with results.

1 Like