Elasticsearch Cluster size

dadadima · March 20, 2020, 9:14am

I have a dataset, on which I would like to perform Anomaly Detection.
The data size until now is 15-20TB and it grows 10-15GB per day. I do not need to query much the data (of course I will do some data exploration on my own, or maybe just 1-2 user at a time, but won't definitely power any large scale multi-user application). So the main purpose is Anomaly Detection and I will be only managing it.

At the beginning, I am planning to use just the static dataset of 15-20TB. So:

Will the ratio 50GB per shard still apply in this case?
How many data nodes should I use?
How many machine learning nodes should I use?

If I let the dataset grow with the ratio mentioned before, will the parameters above change much?

Thank you

richcollier · March 24, 2020, 1:07pm

Cluster size calculations depend on many things, including:

Ingest rate per day
Data retention (how long you expect to keep the data around)
Expected search performance (how fast queries need to be)
Node hardware and storage hardware performance

In other words, you could architect a cluster that has a low-ish node count on mediocre hardware, but could operate decently, or you could architect a cluster that has a right-sized amount of nodes on fast hardware (with SSDs for example) that would be blazingly fast.

If you are truly interested in doing Anomaly Detection is a production environment, then that is a paid feature. If you are in the market for paying for an Elastic Subscription, then you can also have one of our Solutions Architects help you size a cluster for the appropriate use case. Just contact us (sales@elastic.co) and we're here to help.

If, on the other hand, you are just doing this for academic research (which seems to be the case from your previous posts), then you're going to be a bit on your own on this one. You can get guidance from our blogs - for example:

https://www.elastic.co/blog/sizing-hot-warm-architectures-for-logging-and-metrics-in-the-elasticsearch-service-on-elastic-cloud

Good luck!

system · April 21, 2020, 1:07pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Trying to optimize Elasticsearch cluster Elasticsearch	3	971	February 20, 2017
Tiny dataset very high read rate, how to optimise? Elasticsearch	2	408	December 13, 2019
Cluster size (help to define it) Elasticsearch	2	366	June 2, 2020
Cluster Achitecture Elasticsearch	2	317	July 20, 2018
Elasticsearch Capacity Planning Help Required Elasticsearch	3	574	November 24, 2019

Elasticsearch Cluster size

Related topics