I’m working on a solution involving the use of searchable snapshots, and I need some clarification to properly size a second cluster. Here’s the context:
I plan to use my production cluster (Platinum license) to store approximately 30 TB of data on an S3-based repository. These 30 TB represent 13 months of log data, matching the desired retention period.
I want to set up a secondary cluster (Enterprise license) to query the data through searchable snapshots, using frozen nodes to optimize costs.
I have a few doubts about sizing and the functionality of the secondary cluster:
Local storage requirements:
Does the local storage of the secondary cluster need to have a capacity equivalent to the S3 repository (30 TB)? Or can it be sized with significantly less storage, for example, 4 TB?
Search performance across the entire dataset:
If a frozen node has only 4 TB of local storage, would it still be possible to execute a single search query covering the full 13-month data range?
Alternatively, would it require splitting the query into smaller time intervals to cover the entire dataset?
I’d appreciate any guidance or recommendations to proceed with the design and sizing of the secondary cluster.
This confuses me as the Platinum License only supports Local Storage for all nodes (SSD / HDD / EBS, etc), not S3 based... so I am unclear what you mean by the above statement.
2nd I am confused by your proposed Solution of Why you need a 2nd cluster in the first place? I doubt adding a 2nd cluster for 30TB of data will be more cost-effective than 1 properly configured/sized performant cluster (either Platinum / Enterprise)
3rd How are you planning to "Link / Migrate / Load" the data to the 2nd cluster... I bring this up because if you use CCS or CCR, all clusters involved need to have the same license level.
In short, the Local Storage on the Frozen Tier i.e., 4TB of SSD, can be considered a cache of the Data in the Searchable Snapshot in S3 ... which can be up to ~90-100TB of searchable snapshots per node.
So there are some initial answers, but it is unclear what you are actually trying to accomplish and whether you would have a better TCO and whether the supported configuration is licensable.
You should contact your Elastic Account team to discuss....
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.