Size of an index is very less than the raw document size when ingested via logstash

madhanbaskar · August 9, 2021, 1:09pm

I have a JSON file which has 93k records in it. The size of the JSON file is 413MB. When Indexed via logstash it is just 88MB in the index. I could see all the records ingested.

Index setting :
I have 50 fields for indexing and 200 fields are not indexed and the same is configured in the index template.

I have _source configured. No replicas configured. Removed message fields in logstash. I have tried benchmarking with the above settings and could see huge difference in the index size vs raw doc size.

Can someone help me understand how there is a big difference in the size?

Thanks!

madhanbaskar · August 9, 2021, 1:14pm

@warkolm - Could you please help me here?

stephenb · August 9, 2021, 1:51pm

Elasticsearch compresses the data on ingest. See Here
index.codec

Certain types of text data compresses very well.

Elasticsearch is very efficient with the fields indexed.

It looks like you cleaned up / removed non-required field / data.

Are you seeing any issues in the data when it is retrieved? Otherwise sounds like you are in good shape.

madhanbaskar · August 9, 2021, 2:23pm

Thanks Stephen!
As of now, I dont see any issues in the data when retrieved.
Great that I'm in a good shape. Thanks!

system · September 6, 2021, 2:24pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Documents size vs. indexing size? Elasticsearch	2	1181	June 30, 2022
ElasticSearch index size peculiarity Elasticsearch	2	660	July 6, 2017
Elasticsearch index storage size Elasticsearch	2	586	November 22, 2019
How to reduce the log size? Elasticsearch	10	4401	December 6, 2017
Index size Elasticsearch	1	381	July 6, 2017

Size of an index is very less than the raw document size when ingested via logstash

Related topics