I have a bunch of indices that I've remotely re-indexed into a single node as a sort of a backup, these won't need to be written to anymore so I'd like to make them smaller, if possible.
What would be the correct way of setting index.codec to best_compression on these existing indices?
You will notice from the docs here that index.codec is a static setting meaning it can not be changed while the index is open and if it is changed it does not take effect until the index is merged
index.codec
The default value compresses stored data with LZ4 compression, but this can be set to best_compression which uses DEFLATE for a higher compression ratio, at the expense of slower stored fields performance. If you are updating the compression type, the new one will be applied after segments are merged. Segment merging can be forced using force merge.
So you will need to close the index then apply the index.code then open the index then forcemerge the segments. Forcemere may take time and space depending on the size of the indices.
POST my-index/_close
PUT my-index/_settings
{
"codec": "best_compression"
}
POST my-index_open
POST my-index/_forcemerge?max_num_segments=1
Curious why you are not just using snapshots for this
Thanks for your help.
No reason other than that I haven't looked into snapshots properly.
Now that I learnt from you that it's the way to go I'll do some research.
I'm looking at this snapshot setting: Read-only
Only one cluster should have write access to this repository. All other clusters should be read-
only.
If have two single nodes that I would like to snapshot to one backup-server can they both write to it, provided I do manual snapshots at different times to avoid corruption? or do I setup two separate nfs shares to the backup-server where each nodes has got write access to one share
From what I've read so far I can create a manual snapshot per index without a policy, as I don't want to take a snapshot of the whole cluster?
I'm hoping to replace the process- what I've been doing up until now is
Re-index the last month's daily indices into a monthly index
Remote re-index the last monthly index to my "es-backup" node and delete on production node to free the space up
I realize I'm running two expensive operations here, but even remote re-index is a new concept for me, I'm guessing I should be able to remote-reindex the daily indices into a monthy, instead of doing it twice
Furthermore, If snapshot compression only applies to metadata, should I first run best_compression and merge on the index before creating the snapshot?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.