Elasticsearch performance in HDD vs SSD and 32 GB vs 64 GB of RAM

DavidTurner · June 2, 2023, 8:45am

This blog post, while old, is still about right. TLDR disk usage will often be within a factor of 2 of the input data size (either way) depending on the data and the mappings

Don't store large binary data like images in ES, it's a waste of your valuable in-cluster resources. Put them somewhere cheaper, with a link stored in ES which points to the binary data. You can index them if you want to use vector search, just don't store them there. See store | Elasticsearch Guide [8.11] | Elastic for a little more info on the difference.

Don_Boscow · June 2, 2023, 9:01am

Thanks David. Yes, that sounds like a good idea. We can store the images in S3 or some other storage and do just the indexing, classifying elsewhere. Just curious - the accompanying Medium article (for some reason I am unable to paste the link here, the post is flagged) seemed very promising. This was obviously a basic idea, to make a proper image search engine, we have to employ a combination of sophisticated NLP, vector feature extraction, self-supervised classification, random forests, etc. Will all this be possible by just running the classification, feature extraction, etc. on ES while the actual images are stored in a different server, like S3, for serving to the end user?

Don_Boscow · June 2, 2023, 9:03am

Thanks. I won't. I will stick to 50 GB shard sizes as you and others have explained in this post. I was using this just as an example. My concern is that server with SSD and high RAM (~64 GB) is very expensive, so I need a way to have an idea of how much space I might end up needing to index 5 PB of data (plus its replicas, assuming only one replica for now), so that I can estimate the potential cost of such an endeavour.

DavidTurner · June 2, 2023, 9:40am

Not sure, this is outside my area of expertise. I would hope so, but you might do better to open a separate thread on this question because the experts in this area have probably stopped reading this thread by now.

Don_Boscow · June 2, 2023, 9:47am

Got it. Thanks a lot, David. You opened up my mind towards an area I wouldn't have thought of by myself!

system · June 30, 2023, 9:48am

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
SSD and one replica vs HDD and more replicas Elasticsearch	10	3068	July 5, 2017
ElasticSearch on giant compute nodes Elasticsearch	1	180	October 14, 2023
Elasticsearch hardware planning Elasticsearch	5	796	July 6, 2017
ES server specs for a good search performance Elasticsearch	4	222	October 4, 2023
Elastic setup (slow queries) Elasticsearch	4	822	July 3, 2017

Elasticsearch performance in HDD vs SSD and 32 GB vs 64 GB of RAM

Related topics