70-80% of the storage use for my indices are the Stored Fields. - that is, the .fdt files.
I am running in the cloud, and this becomes expensive, since I need quite large machines to contain the indices.
The total size of the indices are ~15-20TB.
Therefore I think about moving the stored fields onto a cheaper disk. I am aware, that this will make queries slower, but is it possible to do in an elegant and easy way?
Ie. by setting a pointer to the .fdt files position, somewhere in the settings?
I am running es 1.7.5, and I am aware that it would be preferable to upgrade it, but it is currently not a possibility
No, Elasticsearch does not support splitting a segment across multiple locations like this. If you are content with slower queries, why not put everything on the cheaper disk?
Because I would expect the performance penalty, would be significantly less if I can keep the indices on the faster disks, and only move the stored fields to the cheaper slower disk.
The stored fields are - as far as I know - only used, when returning hits, and thus it would only be for the i.e. top 10 hits I return, it has to go to the slower disc.
And since the stored fields is such a large percentage of the storage use, I think that would be a nice compromise.
I see. It's an interesting idea, and I don't know if the Elasticsearch team have previously discussed using something like a FileSwitchDirectory in this manner. I suspect it'd be a bit of a pain to manage. It seems like a reasonable feature to request, so could you open a Github issue with this idea for wider discussion within the team?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.