Hi!
I need to reduce de size of a database by shrinking one field from a document. The problem is: the size of this database is huge. Also, I need to not reduce the performance of it significantly. I'm wondering if I can reindex the document and use a _forcemerge. BUT I saw that I can't write while doing that, so it's not a good idea. Can someone guide me to the best practice about removing versions from a document after reindex in database yet being used
to write?
Where did you read that you cannot write while a merge is happening? Elasticsearch is doing merging in the background all the time while indexing happens.
"Force merge should only be called against read-only indices . Running force merge against a read-write index can cause very large segments to be produced (>5Gb per segment), and the merge policy will never consider it for merging again until it mostly consists of deleted docs. This can cause very large segments to remain in the shards."
yes, this is an important advice. Only run force-merge if the index will not see any updates. Otherwise merges will be running at some point in the background anyway.
To me, reindex (without a forcemerge) sounds like a good idea to start with.
as usual there is no clear answer, you might need more space during reindex until save spacings kick in (especially when indexing into a new index), so you need to be sure you will be able to sustain that.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.