I've been experimenting with my 3 nodes cluster with a focus on pushing the ingestion performance. The data set I use has 3 million lines, ~840MB, going in through logstash. Although I get decent performance ingesting into an empty index, getting ~6k lines/second, the ingestion slows down as the number of documents in the index grow. After a while I see in the logstash log file entries indicating the elasticsearch ingestion endpoint is not responding. Looking on the motoring tab and running REST query calls with Postman, I see the number of segments fluctuates constantly, seemingly indicating frequent segments merging, which I thought can get expensive as the size of the segments grow and cause elasticsearch to throttle down ingestion.
I do have my 3 hosts on VMs sharing a spinning disk managed by VMware ESXi, but before I try to switch to SSD for datastore, does anybody has any suggestion on how I can debug on the elasticsearch side and narrow down or confirm the cause of my slowing ingestion performance?