I have 4 EC2 machines in Elasticsearch Cluster.
Configuration: c5d.large, Memory: 3.5G, Data Disk: 50GB NVME instance storage.
Elasticsearch Version: 6.8.21
I added the 5th machine with the same configuration c5d.large, Memory: 3.5G, Data Disk: 50GB NVME instance storage. After that, Search requests are taking more time than earlier. I enabled slow logs, which shows only shards that are present on the 5th node are taking more time for search. Also, I can see high disk Read IO happening on new node when I trigger search requests. The iowait% increases by the number of search requests and goes up to 90-95%. All old nodes do not show any read spikes.
I checked elasticsearch.yml, jvm.options and even sysctl -A configurations. there is no diff between config on new nodes vs old nodes.
Shard shuffling was already completed. I also waited for 20 minutes for the CPU to stabilize. then, I triggered search requests. Also, Disk Reads spikes only when I trigger search requests and only on new machine.
I tried with a new VM. Getting the same issue. So there is nothing wrong with the provisioned VM.
Well, I found diff in lscpu in old vs new vms. New VM has better CPU.
Please find below lscpu output. Also, new VM has one extra cpu flag - invpcid_single and it does not have hle and rtm flags compred to old VMs.
Perhaps on the other nodes the data is already cached in RAM...
This is not the case. I restarted Elasticsearch process on old VMs. Even after the restart, I do not see any disk spikes. Does Elasticsearch maintains some kind of computations/cache on disk?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.