Elasticsearch Compression ratio

we want to use Elasticsearch for a large amount of data.
one of the important issue is storage usage.

we create a sample index with 412 million

412 million rows take 242 GB of hard ===> 590 Bytes for each row

we know each row of our data with json format has 800-1000 Bytes size

so elastic compressed our 900 Bytes data into 580 Bytes ...

is there any better way to compress our data?

There are some guidelines here. What does your mapping look like?

most fields are integer...
we have 50-60 fields for each document and most of them should be searchable( exact search and range search )

Are you using the best_compression codec? Do you have the _all field enabled? If so, do you need it?

we use default settings of Logastash template And Elastic ...
the compression is default( I think lz4) and _all filed is enabled.
we don't need _all field and we should disable it ...
but for best_compression, does it impact the indexing performance?

Using best_compression does have some impact on indexing performance, but does compress the source a lot better and can save a significant amount of disk space. Disabling the _all field will also save space if you do not need it. If you have fields that do not need to be aggregated upon or be subject to free-text search, you can also slimline the default Logstash mappings and not have all fields dual-mapped. I discussed these topics and the trade-offs in blog post.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.