Data storage - too much space taken

linuxdev · November 9, 2016, 3:27pm

So im testing elasticsearch and it turns out that I accumulate ~1gb of data ever 24 hours.
This is way more than Ive anticipated.
I initially planned to dump data on harddrive, but with 1Tb drive it will get filled up fairly soon...

What do u guys do in cases like this? Where do u store so much data?

nik9000 · November 9, 2016, 4:25pm

Most folks age out data after a certain amount of time, removing those indices.

You can usually save space by:

You can turn off doc_values for fields you won't sort or aggregate on to save space and a bit on indexing time.
You can turn off indexing on fields you don't need to search by or you can use index_options to store fewer things.
_force_merge old indices. You can also look at the shrink API to reduce the number of shards. These aren't mutually exclusive but don't save a ton of space, usually.

There are probably more things I'm forgetting.

linuxdev · November 9, 2016, 4:52pm

Yeah - I agree you can do certain optimizations and save some space.
But then I think large companies also just don't store data on a single harddrive.
I strongly suspect they use some alternatives like some expandable cloud storage or something like that...
Im more interesting in that sort of approach.

jprante · November 9, 2016, 7:21pm

A current mid-size server can take about an 8 TB RAID disk subsystem, e.g. 8 x 2,5" slots with enterprise SAS disk, around 1 TB capacity per drive.

Even with that single server (actually, you will consider at least three servers to establish a truly distributed system), you could comfortably run 1 GB/day disk space growth rate for 4000 days (>10 years) without even thinking of archiving or aging out the index data to reach 4 TB disk allocation. I don't know about your resource calculations but after 10 years you will certainly feel ready to move to a next server hardware generation.

system · December 7, 2016, 7:21pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Indexes and time to keep information Elasticsearch	5	3895	February 20, 2017
Experiences in "how to manage much data" needed Elasticsearch	8	549	August 10, 2018
Recomendations on Managing Storage Size Elasticsearch	6	448	June 14, 2020
Elastic and HDD Storage Elasticsearch	5	1304	May 26, 2017
Working on old Indices Elasticsearch	3	330	January 25, 2019

Data storage - too much space taken

Related topics