I have an index which is 2.5 times larger than the raw data size. The estimated index size is around 6TB for 40 million text documents. A key reason for this is the use of shingles.
I have a few concerns about having a large index size, including long re-index times during version upgrades or index recovery. Are there other areas to be concerned about for large indexes? e.g. indexing speed, search speed, stability of Elastic search etc. ?
I am thinking of enabling compression for the indexes to save some storage space. Is compression enabled by default in Elasticsearch?