Size of an index is very less than the raw document size when ingested via logstash

I have a JSON file which has 93k records in it. The size of the JSON file is 413MB. When Indexed via logstash it is just 88MB in the index. I could see all the records ingested.

Index setting :
I have 50 fields for indexing and 200 fields are not indexed and the same is configured in the index template.

I have _source configured. No replicas configured. Removed message fields in logstash. I have tried benchmarking with the above settings and could see huge difference in the index size vs raw doc size.

Can someone help me understand how there is a big difference in the size?

Thanks!

@warkolm - Could you please help me here?

Elasticsearch compresses the data on ingest. See Here
index.codec

Certain types of text data compresses very well.

Elasticsearch is very efficient with the fields indexed.

It looks like you cleaned up / removed non-required field / data.

Are you seeing any issues in the data when it is retrieved? Otherwise sounds like you are in good shape.

Thanks Stephen!
As of now, I dont see any issues in the data when retrieved.
Great that I'm in a good shape. Thanks!

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.