Hi ,
We are a small business and we currently provide payment services to 60,000 users. We use Elasticsearch to search our logs. Our current resources for using elasticsearch are as follows :
3 elasticsearch master nodes with 4g of memory and 8-core intel xeon 2.40 CPU
2 elasticsearch data nodes with 16g of memory, 8-core intel xeon 2.40 CPU and 240gb of storage.
The amount of data generated per day is 400 megabytes.
We are using elasticsearch 6.1.1 version. Each index is being saved on 10 shards which 5 of them are primary and the rest of them are replica.
Unfortunately, due to the mentioned specification, we face lack of memory and slowness in our data nodes.
We want to make sure that we have met the minimum required resources to have a good experience.
And finally how much resources do we need for the next 1 million users in the future?
Could you confirm that each node has 240GB of storage but you generate 400GB of data per day? This would mean you can't even store two days' worth of data.
Ok, that makes more sense. I think you have far too many shards, and your cluster performance is suffering as a result. It is worth reading this article:
In particular:
Aim to keep the average shard size between at least a few GB and a few tens of GB. For use-cases with time-based data, it is common to see shards between 20GB and 40GB in size.
If you generate 400MB per day, split across 5 primaries, then each shard must be ~80MB, which is ~500x smaller than the recommended size. I think it would be better to have a single primary in each index, and to use monthly indices rather than daily ones. This will enormously reduce your shard count and you should see better performance as a consequence.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.