I've ingested 120GB of data into an elasticsearch index, and ~115GB of that data is used for storing the source document. Considering I don't need the source document and I only need the IDs of the documents that match an ES query, I tried disabling storing the source document. However, the space usage didn't go down, apparently due to an ES feature called soft delete:
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.