ElasticSearch High Search response times on new Machine

I have 4 EC2 machines in Elasticsearch Cluster.
Configuration: c5d.large, Memory: 3.5G, Data Disk: 50GB NVME instance storage.
Elasticsearch Version: 6.8.21

I added the 5th machine with the same configuration c5d.large, Memory: 3.5G, Data Disk: 50GB NVME instance storage. After that, Search requests are taking more time than earlier. I enabled slow logs, which shows only shards that are present on the 5th node are taking more time for search. Also, I can see high disk Read IO happening on new node when I trigger search requests. The iowait% increases by the number of search requests and goes up to 90-95%. All old nodes do not show any read spikes.

I checked elasticsearch.yml, jvm.options and even sysctl -A configurations. there is no diff between config on new nodes vs old nodes.

What could be the issue here?

Did all the shard shuffling complete?

Were the searches performed during the rebalance?

BTW 6.8 is EOL you should consider upgrading as a matter of urgency.

Shard shuffling was already completed. I also waited for 20 minutes for the CPU to stabilize. then, I triggered search requests. Also, Disk Reads spikes only when I trigger search requests and only on new machine.

Few Details about index:

{
"health": "green",
"status": "open",
"index": "my-index-name",
"uuid": "m1ZlUa5USgKAUy1MkSasJg",
"pri": "12",
"rep": "1",
"docs.count": "64542359",
"docs.deleted": "19423498",
"store.size": "83gb",
"pri.store.size": "40gb"
}

Regarding EOL, Yes, We are planning to move to the latest version of ES after few months.

Not sure... you could try force merge with expunge delete to clean up the segments.

Perhaps on the other nodes the data is already cached in RAM...

Otherwise perhaps something is different with the underlying HW / OS.

I tried with a new VM. Getting the same issue. So there is nothing wrong with the provisioned VM.

Well, I found diff in lscpu in old vs new vms. New VM has better CPU.
Please find below lscpu output. Also, new VM has one extra cpu flag - invpcid_single and it does not have hle and rtm flags compred to old VMs.

New VM

lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                2
On-line CPU(s) list:   0,1
Thread(s) per core:    2
Core(s) per socket:    1
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz
Stepping:              7
CPU MHz:               2999.998
BogoMIPS:              5999.99
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              36608K
NUMA node0 CPU(s):     0,1
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 ida arat pku ospke

Old VM

lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                2
On-line CPU(s) list:   0,1
Thread(s) per core:    2
Core(s) per socket:    1
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 85
Model name:            Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz
Stepping:              4
CPU MHz:               3000.000
BogoMIPS:              6000.00
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              1024K
L3 cache:              25344K
NUMA node0 CPU(s):     0,1
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 ida arat pku ospke


Regarding this comment,

Perhaps on the other nodes the data is already cached in RAM...

This is not the case. I restarted Elasticsearch process on old VMs. Even after the restart, I do not see any disk spikes. Does Elasticsearch maintains some kind of computations/cache on disk?