Elasticsearch index size less then dataset disk space

Fradi · July 5, 2021, 5:37pm

Hello there,
I have elasticsearch version 7.10.1, I work on the file from MeDAL Dataset | Kaggle full_data.csv 14.12 GB I divided it into 179 csv files containing 300,000 lines each and loaded them into elasticsearch. However, I can see now that the given index size is smaller than the original dataset and amounts to 12.9gb. The number of documents loaded is the same as the original. For all columns from the original file I will set the analyzer type and stopwords for the English language. My intuition tells me that the index should be larger than the original. Is it possible that the index will contain less disk space?

warkolm · July 5, 2021, 11:28pm

Yes it's possible, as Elasticsearch uses compression on indexed data by default.

system · August 2, 2021, 11:28pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elasticsearch index storage size Elasticsearch	2	586	November 22, 2019
Question on Index Size Elasticsearch	4	422	July 6, 2017
Compression in ElasticSearch Elasticsearch	6	2344	July 5, 2017
Index size on disk Elasticsearch	8	3549	April 16, 2018
Weird storage change Elasticsearch	4	605	July 20, 2017

Elasticsearch index size less then dataset disk space

Related topics