Elasticsearch index size less then dataset disk space

Hello there,
I have elasticsearch version 7.10.1, I work on the file from MeDAL Dataset | Kaggle full_data.csv 14.12 GB I divided it into 179 csv files containing 300,000 lines each and loaded them into elasticsearch. However, I can see now that the given index size is smaller than the original dataset and amounts to 12.9gb. The number of documents loaded is the same as the original. For all columns from the original file I will set the analyzer type and stopwords for the English language. My intuition tells me that the index should be larger than the original. Is it possible that the index will contain less disk space?

Yes it's possible, as Elasticsearch uses compression on indexed data by default.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.