How could an index size inflate 3 times larger than raw data size?

chenryn · June 3, 2016, 8:45am

I have an elasticsearch-1.7 cluster and ingest about 800GB log data into an index. Then I found the /_cat/indices API show me the index size was 2.2TB.

The number_of_replicas of this index has already been set to 0. The _all field set to be disable too. So, how could the index size inflate so much ?! Nearly 3 times.

Most of the log data haven't extract new fields. They only have some meta field and one raw message. only 10% of the data was JSON format that may have some fields.

I had heard some information that elasticsearch-1.x may use more space after segments merge if multiple _type have some field but typeA has a little doc with long length and typeB has lots of doc with small length.

Is this possible problem keeping the same in only one _type in one index? Is this resolved in elasticsearch-2.3?

warkolm · June 4, 2016, 8:26am

What's the mapping look like?

Topic		Replies	Views
Index size on disk Elasticsearch	8	3604	April 16, 2018
Storage Ratios - I my syslog streams are expanding in elastic search to more than 10:1? Elasticsearch	5	504	July 6, 2017
Elasticsearch indices suddenly much larger Elasticsearch	5	759	June 26, 2018
Index size seems massive to what the data is being sent Elasticsearch	5	841	May 14, 2018
ElasticSearch index size peculiarity Elasticsearch	2	690	July 6, 2017

How could an index size inflate 3 times larger than raw data size?

Related topics