Data storage - too much space taken


(jc) #1

So im testing elasticsearch and it turns out that I accumulate ~1gb of data ever 24 hours.
This is way more than Ive anticipated.
I initially planned to dump data on harddrive, but with 1Tb drive it will get filled up fairly soon...

What do u guys do in cases like this? Where do u store so much data?


(Nik Everett) #2

Most folks age out data after a certain amount of time, removing those indices.

You can usually save space by:

  1. You can turn off doc_values for fields you won't sort or aggregate on to save space and a bit on indexing time.
  2. You can turn off indexing on fields you don't need to search by or you can use index_options to store fewer things.
  3. _force_merge old indices. You can also look at the shrink API to reduce the number of shards. These aren't mutually exclusive but don't save a ton of space, usually.

There are probably more things I'm forgetting.


(jc) #3

Yeah - I agree you can do certain optimizations and save some space.
But then I think large companies also just don't store data on a single harddrive.
I strongly suspect they use some alternatives like some expandable cloud storage or something like that...
Im more interesting in that sort of approach.


(Jörg Prante) #4

A current mid-size server can take about an 8 TB RAID disk subsystem, e.g. 8 x 2,5" slots with enterprise SAS disk, around 1 TB capacity per drive.

Even with that single server (actually, you will consider at least three servers to establish a truly distributed system), you could comfortably run 1 GB/day disk space growth rate for 4000 days (>10 years) without even thinking of archiving or aging out the index data to reach 4 TB disk allocation. I don't know about your resource calculations but after 10 years you will certainly feel ready to move to a next server hardware generation.


(system) #5

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.