Best practices for Elastic DB deployment for 100TB data

Hello,
We are designing new product that based on Elasticsearch that suppose to have around 100TB data
Deployment will be on-prem (bare metal with Linux OS installed on it)
and would like to ask:
What is best practices from resources deployment POV? I mean:
How many hosts do we need for such amount of data using elastic?
what is best deployment of elastic? containers?

Thanks
Noam

What is the use case? What is the expected indexing and query throughput? Do you have any latency requirements? What type of data are you indexing? What type of queries will you be running?

Also suggest you to take a look at
GitHub - gbaptista/elastic-calculator: Elasticsearch cluster calculator: How many shards and replicas should I have? to get an idea .

As @Christian_Dahlqvist mentioned knowing the use case and throughput would help in proper capacity calculation.

1 Like

data will be sent to Elastic DB using Kafka, the data type will be json or logs from several systems
regarding latency, not real time system, assuming it will up to 100ms
regarding type of queries, Query DSL
hope info is good enough for answer in high level, but do let me know if should provide more info

It sounds like you will have time-based data in the form of logs coming in. Apart from that it does not tell me much.

The following parameters will affect the sizing of your cluster, but it is not a complete list:

  • Number of different types of data being ingested.
  • Peak and average ingest rate for different data types.
  • The expected retention period of each data type.
  • Size and complexity of documents and associated mappings.
  • How you are querying the data. Are you using Kibana dashboards? Are you querying the data in other ways? Are queries targeting parts of the total data set or all data?
  • How many users/query concurrency do you expect to need to support? What are the latency requirements for dashboards and/or queries?
  • Hardware used, especially storage performance.

I would recommend reading this blog post and look at this webinar. There are other resources available as well. In the end it is likely you will need to run some benchmarks in order to get an accurate estimate.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.