I often see questions regarding hardware requirements and I can't find the official Elastic requirements.
For instance I cant calculate budget after tests on production. I need to know how many resources I need before I start. Now it is not easy to even relatively estimate it
Could you please publish official hardware requirements, information how to calculate it and simple examples? Or if not how to get needed hardware requirements for project.
Elasticsearch is a quite flexible piece of software and can be used for a lot of different use cases. Cluster architecture and hardware requirements often vary between types of use cases. If you provide some information about your use case we might be able to point you to some suitable resources.
Let's assume that we want to store logs. For example applications create 100GB of log files per day. I want to send data to Elasticsearch using Filebeat, RAW data should be parsed to extract value about log level (info/error/warning etc) to additional field. I would keep all data for 1 month, but errors for 3 months. I think about store all data in one index and error copy to additional index using pipeline in Elasticsearch. I don't think so that replication is required, maybe only additionally snapshots (not "must have"). It is very simple and general example.
Would you be able to suggest a setup and hardware requirements?
I would recommend using a daily index for the non-error data and perhaps a weekly index for the error data, assuming this is a reasonable small portion of the data. This allows you to manage retention by deleting indices, which is the recommended approach.
100GB raw data across 30 days is around 3TB of raw data. How much space this takes up on disk depends on how much you parse and enrich this as well as your mappings. Even if it was not to shrink it is well within what a single node can handle (as long as you are not looking for HA or the resiliency a cluster can provide). If you are on Elasticsearch 7.7 or above there have been improvements to how much heap is required so you might get away with less than you used to. I would guess a single node with 32GB RAM and 16GB heap and 4 CPU cores might be sufficient as long as you have good, performant storage but you may be able to do with even less.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.