Hardware recomendations

We have an Elasticsearch cluster which needs a hardware refresh, we currently have 5 servers each of "AMD EPYC 7702P 64-Core Processor" and 512GB of RAM, these machines are sitting around 60-80 % busy and this is why then need replacing.

The cluster averages about 270 requests a second. I am thinking about 5 machines with either zen 5 or zen5c processors. We have no ML workloads. How do I decide what processors I should look at ?

My suggestion would be to focus on RAM and total number of cores first.

One of the most important factors when selecting hardware for Elsticsearch is the storge as Elasticsearch can be very I/O intensive, especially if the data set does not fit in the operating system page cache or there is a lot of indexing/updates.

Have you analysed what Elasticsearch is spending time on when it is busy? Have you run the hot threads API?

I have intel optane drives, those things are very fast and I am going to shift the drives over when I buy new hardware.

The comment about number of cores pushes me towards zen5c, currently with 128 threads I still have plenty of ram left over with the data on disk being 107GB.

james_@elastic01:~$ free -g
               total        used        free      shared  buff/cache   available
Mem:             440          39         252           0         148         397
Swap:              9           0           9

Running the hot threads api, I don't know how to interpret this ...

   100.3% [cpu=100.3%, other=0.0%] (501.6ms out of 500ms) cpu usage by thread 'elasticsearch[elastic05][search][T#102]'
     4/10 snapshots sharing following 27 elements
       app//org.apache.lucene.util.packed.MonotonicLongValues.get(MonotonicLongValues.java:40)
       app//org.apache.lucene.util.packed.PackedLongValues.get(PackedLongValues.java:108)
       app//org.apache.lucene.index.OrdinalMap$3.get(OrdinalMap.java:328)
       app//org.elasticsearch.index.fielddata.ordinals.SingletonGlobalOrdinalMapping.ordValue(SingletonGlobalOrdinalMapping.java:69)
       app//org.apache.lucene.index.SingletonSortedSetDocValues.advanceExact(SingletonSortedSetDocValues.java:83)
       org.elasticsearch.join.aggregations.ParentJoinAggregator.prepareSubAggs(ParentJoinAggregator.java:142)
       app//org.elasticsearch.search.aggregations.bucket.BucketsAggregator.buildSubAggsForBuckets(BucketsAggregator.java:172)
       app//org.elasticsearch.search.aggregations.bucket.BucketsAggregator.buildAggregationsForSingleBucket(BucketsAggregator.java:305)
       org.elasticsearch.join.aggregations.ParentToChildrenAggregator.buildAggregations(ParentToChildrenAggregator.java:43)

...
     2/10 snapshots sharing following 25 elements
       app//org.apache.lucene.codecs.lucene80.Lucene80DocValuesProducer$22.advanceExact(Lucene80DocValuesProducer.java:1019)
       app//org.elasticsearch.index.fielddata.ordinals.SingletonGlobalOrdinalMapping.advanceExact(SingletonGlobalOrdinalMapping.java:74)
       app//org.apache.lucene.index.SingletonSortedSetDocValues.advanceExact(SingletonSortedSetDocValues.java:82)
       org.elasticsearch.join.aggregations.ParentJoinAggregator.prepareSubAggs(ParentJoinAggregator.java:142)
       app//org.elasticsearch.search.aggregations.bucket.BucketsAggregator.buildSubAggsForBuckets(BucketsAggregator.java:172)
       app//org.elasticsearch.search.aggregations.bucket.BucketsAggregator.buildAggregationsForSingleBucket(BucketsAggregator.java:305)
       org.elasticsearch.join.aggregations.ParentToChildrenAggregator.buildAggregations(ParentToChildrenAggregator.java:43)