I recently saw this issue; Don't set index.codec: 'best_compression' for TSDB data streams · Issue #160288 · elastic/kibana (github.com), and I was kind of curious. What is the guidance for compression as part of the rollover of an index. Normally it has been something along the lines of, when rollover, force merge to 1 segment, and enable best compression (for optimal search speed). Is this still the case, or is there a "better" way to have TSDS indices during rollover (leave force merge? don't do best compression?)
I wasn't able to find anything in the docs about this.
The index.codec index setting control compression for stored fields in Lucene.
In Elasticseach we use stored fields for the _source and the _id and allows us to quickly look up values for documents that have matched (in the get api and in the fetch phase as part the search api).
However with the TSDB the stored fields for _id and _source get trimmed away when no longer needed internally to lower disk space usage. So the extra effort that would be spend on doing best compression would then be lost. With this in mind, we advice not to use index.code=best_compression, or use the this setting at all for tsdb data streams.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.