So im testing elasticsearch and it turns out that I accumulate ~1gb of data ever 24 hours.
This is way more than Ive anticipated.
I initially planned to dump data on harddrive, but with 1Tb drive it will get filled up fairly soon...
What do u guys do in cases like this? Where do u store so much data?
_force_merge old indices. You can also look at the shrink API to reduce the number of shards. These aren't mutually exclusive but don't save a ton of space, usually.
Yeah - I agree you can do certain optimizations and save some space.
But then I think large companies also just don't store data on a single harddrive.
I strongly suspect they use some alternatives like some expandable cloud storage or something like that...
Im more interesting in that sort of approach.
A current mid-size server can take about an 8 TB RAID disk subsystem, e.g. 8 x 2,5" slots with enterprise SAS disk, around 1 TB capacity per drive.
Even with that single server (actually, you will consider at least three servers to establish a truly distributed system), you could comfortably run 1 GB/day disk space growth rate for 4000 days (>10 years) without even thinking of archiving or aging out the index data to reach 4 TB disk allocation. I don't know about your resource calculations but after 10 years you will certainly feel ready to move to a next server hardware generation.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.