I'm planning a large-scale self-managed Elasticsearch deployment for an application that will eventually handle around 8 petabytes (PB) of data
I'm currently evaluating whether the Basic (free) license of Elasticsearch 8.x will be sufficient for this, or if I will run into licensing-related limitations
Feature restrictions (storing ,querying data )
Scalability limits (nodes, shards, indices)
Support or legal concerns due to the Elastic License 2.0 (ELv2) terms
The cluster will likely include many nodes, and the use case involves:
Time-seriesof data
Heavy indexing and search throughput
A mix of hot/warm data tiers
Possibly ILM,
can use promtheus and gravanat for more open options ?
thanks in advance
The license level "only" activates new features which might be interesting for you to have but there's nothing like size restrictions, speed limits or whatever with a basic license.
But, I'd ask a sales rep at Elastic about a quotation for your use case.
For example, searchable snapshots is a really great feature (for your use case) which could reduce the number of physical nodes you need if you are thinking of mid-long term retention of data. But on the other hand, you have to buy an enterprise license for it.
I like to think about it this way:
either you pay for the machines
either you pay for the license
Bonus point for the later: of course, you have access to many other features like AI connectors, management of Elastic agents, synthetic source to name a few, and official support!
More on this here (even though it's not always easy to read this list):
But to answer your question again, you can use the basic license for your project.
From what you’ve written , there wouldn’t seem to be a problem but check the matrix at
For features you intend to/might use.
If you have specific concerns on the legal side, and it’s not clear why you would have, consult specialists please.
Support? You have access to forums like this, where absolutely no SLA applies, and other than that … well, it would be self-managed and self-supported. I know some organisations where that wouldn’t be acceptable , but that’s a policy choice.
At this scale it's surely going to be outrageously expensive not to use searchable snapshots. The hardware savings are going to pay for the license and then some.
thanks for your reply , so if it's self managed with subscribtion , will this require the servers to be over internet ? to verify the licence for example ? as in my case this is not gonna work for me ?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.