When i make index/update for many document the elasticsearch index get so much bigger and took 30minutes to get small if no operations is made.
some _settings that i use:
"number_of_replicas" : 0
"requests.cache.enable": false
The index originally have +-250GB, but when indexing/updating it get up to 500GB, 700GB, and my machine have only 900gb of total space. I use only one node
My questions are:
1 - why it get so bigger?
2 - how can i reduce the size back to the normal faster?
3 - have any specific settings for help with that?
4 - is possible to avoid it?
What type of storage are you using? Local SSDs? If not it is possible that merging is falling behind resulting in a temporarily enlarged index. It could also be that the updates takes a while to make large shards av as liable for merging which will also increase size.
You might get a segment list and see what it's doing, how big they are, and if they are merging (you have to use API as not in Kibana I think)
Did you try a force merge (stopping indexing first)? There are some weird rules on this, but should reduce size if it can merge (and you can see results tih segment list).
How much are you updating, like 1% or much more, as that can really bloat the storage until merge.
Segment list: GET /_cat/segments?v&s=index,size:desc
updating some fields of almost every document / 100%, now i undertand better about segments and merging, i will try to handle index.merge.scheduler.max_thread_count to see if works, if works i will write here
Updating almost every document will definitely bloat your system - I think it's fair to say most people update none or nearly none of them and 100%, especially if you do it repeatedly (like one field now, then another field updated laster), is pretty unusual and will really bloat the data, as you see - then you're in a running IO battle with the merge system that kinda never ends.
Not sure if folks have advice for that, other than aggressive merging, but I think that's a lot of IO and can only go so fast, especially if you are often updating.
Looks like that setting of threads is PER SHARD, which means it can do a lot of merging if the IO can keep up but don't overload your node's IO/CPU.
Note there is an un/semi-documented setting for dynamic auto throttling - since you have an unusual use case, you might turn this off as it can throttle merges after sudden updates, allowing bloating - this is not in docs that I can see, so be careful with it (though the thread limit will keep it from running away, though that's per shard, so you could overload the CPU).
index.merge.scheduler.auto_throttle:
If this is true (the default), then the merge scheduler will rate-limit IO
(writes) for merges to an adaptive value depending on how many merges are
requested over time. An application with a low indexing rate that
unluckily suddenly requires a large merge will see that merge aggressively
throttled, while an application doing heavy indexing will see the throttle
move higher to allow merges to keep up with ongoing indexing.
There is also a completely undocumented setting index.merge.scheduler.max_merge_count that is the above thread count + 5 which presumably is a global limit, so if you have 4 CPUs, this is 9 - don't suggest you play with this, but be aware of it - it will globally limit merge threads to 4+5 or 9 even if you have 24 CPUs; anyway, you have weird use case, so FYI.
Steve thanks a lot for the clear and deep answer ; ) ,
What do you think about this solution:
Stopping the update processing whenever the index is bloated (2x times its original size), then performing a _forcemerge after, waiting for the bloat to reduce and then going back to the update process ? and repeat this solution as many times is needed
my worry is:
The documentation for forcemerge says "it can cause very large segments to remain in the index which can result in increased disk usage and worse search performance." However this is only WHILE the process of forcemerge is running, right ? I need to be sure of this.
about the auto_throttle:
Mateus and I tried setting index.merge.scheduler.auto_throttle to false, however we didn´t see any difference in reducing the bloating/index increased size, we didn´t see any different either for CPU and IO. Is there anything else you could suggest ? Or could be we made some mistake in the process ?
maybe refresh_interval could be a solution:
Changing the refresh_interval is something you see as a solution ? we changed to 10s and didn´t see difference, worth increasing further ?
For refresh_interval, I'd certainly set that longer, like 60s as that'll cut down on all the activity going on and maybe allow more IO, etc. for merging.
For Force Merges, yes, you need to stop writes first, and yes, I'd expect you periodically doing that would help a lot, though you'll have to test how long it takes vs. how many updates you've done, etc. No idea the dynamics of lots of updates which might require merging all the existing segments, which is kinda unusual.
Please report your findings as it's an interesting case - and also how much you really update, e.g. update every doc every day, or update multiple times, etc. as it's a good study of heavy updates and related issues; not commonly seen, I think. Also given segment loads and merges, interesting to see how long force merges take, their sizes, etc.
I guess my final conclusion is if our disk start to get filled too fast we have to stop our update processes and wait for merge to occurs. The problem is that this merge can happen 30 minutes or 4 hours after we stop the update processes, we don´t know and I think we don´t have much control. Forcemerge does not work well in this situation, I hope ES team releases a method to better deal with this in the next releases.
Why is force merge bad, if you can stop the update process? Merging is always happening, though I'd think the question is how fast, and how to make it more aggressive - I'd think the above settings would help that and you can watch in the queues how many merge threads are running, tryin got increase that (maybe increase the queue if you can; not sure).
If i stop the update process and do forcemerge, if for example the index have 354GB (original size is 250GB) and i forcemerge it, the size go to near 350GB, the process took less than 10 seconds to happen (cpu/io have high usage during that time),
if i try again, the size stuck on 350GB (cpu/io does not get high) and only after some minuter or hours (cpu/io activity do not get high equal the first time i done forcemerge during that "sleep ES auto-merge" time) the index have some seconds of HIGH cpu/io usage and it merge.
The full merge (to get back to 250GB) happen in a random time, in a point of time, because when it happens is so fast and consume a lot of io for some seconds, but take too many time to start (and forcemerge do not worked to reduce the size back to 250GB), follow the image of the random automerge:
Hmm, I have no idea though there are some cases where it won't merge - Seems odd to grow from 250G to 345G during updates/deletes, then force merge not push it back down. But then later it goes back down on its own via normal merges.
Ah, there is a note about index.merge.policy.expunge_deletes_allowed - defaults to 10% so if less than 10% deletes, the force merge won't do anything; this may be why your 2nd force doesn't work.
Have you tried various segment count options in force merge, via max_num_segments?
And also tried only_expunge_deletes, which is really what you are trying to do, as it will just replace existing segments with ones without the old docs; seems ideal for you.
Note also, this key warning in the Force Merge Docs, though this seems to be the reverse of what you are seeing; as they say writing to a large segment will then skip it in auto merge, but you are seeing auto merge work where manual does not; regardless, this issue may affect you as it won't auto merge until you update most docs.
Force merge should only be called against an index after you have finished writing to it. Force merge can cause very large (>5GB) segments to be produced, and if you continue to write to such an index then the automatic merge policy will never consider these segments for future merges until they mostly consist of deleted documents. This can cause very large segments to remain in the index which can result in increased disk usage and worse search performance.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.