Elasticsearch best_compression

Hello,

I have a bunch of indices that I've remotely re-indexed into a single node as a sort of a backup, these won't need to be written to anymore so I'd like to make them smaller, if possible.

What would be the correct way of setting index.codec to best_compression on these existing indices?

Many thanks for your help

Hi @VamPikmin

You will notice from the docs here that index.codec is a static setting meaning it can not be changed while the index is open and if it is changed it does not take effect until the index is merged

index.codec
The default value compresses stored data with LZ4 compression, but this can be set to best_compression which uses DEFLATE for a higher compression ratio, at the expense of slower stored fields performance. If you are updating the compression type, the new one will be applied after segments are merged. Segment merging can be forced using force merge.

So you will need to close the index then apply the index.code then open the index then forcemerge the segments. Forcemere may take time and space depending on the size of the indices.

POST my-index/_close
  
PUT my-index/_settings
{
    "codec": "best_compression"
}
  
POST my-index_open
  
POST my-index/_forcemerge?max_num_segments=1

Curious why you are not just using snapshots for this

Hey @stephenb

Thanks for your help.
No reason other than that I haven't looked into snapshots properly.
Now that I learnt from you that it's the way to go I'll do some research.

I'm looking at this snapshot setting: Read-only
Only one cluster should have write access to this repository. All other clusters should be read-
only.

If have two single nodes that I would like to snapshot to one backup-server can they both write to it, provided I do manual snapshots at different times to avoid corruption? or do I setup two separate nfs shares to the backup-server where each nodes has got write access to one share

If they are each single node clusters... Then only one can write to the repository.

Better / recommended would be 2 repositories which can just be 2 different paths ... Each with only one writer.

Thank you so much @stephenb

From what I've read so far I can create a manual snapshot per index without a policy, as I don't want to take a snapshot of the whole cluster?

I'm hoping to replace the process- what I've been doing up until now is

  1. Re-index the last month's daily indices into a monthly index
  2. Remote re-index the last monthly index to my "es-backup" node and delete on production node to free the space up
  • I realize I'm running two expensive operations here, but even remote re-index is a new concept for me, I'm guessing I should be able to remote-reindex the daily indices into a monthy, instead of doing it twice

Furthermore, If snapshot compression only applies to metadata, should I first run best_compression and merge on the index before creating the snapshot?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.