Seeking some ideas to help diagnose our cluster performance. I'll start with the specs:
Data nodes: 5x 64-core 64GB RAM. Oracle Java 8. FreeBSD. ZFS RAID10 36TB.
Search/ingest nodes: 2x with same specs as above, minus the large RAID10 array.
Kibana nodes: Run as VMs on the search/ingest nodes.
The problem I'm having is poor indexing performance. The indexing graph looks like this: /////// whereas it used to look like this: ---------^---_---^-----. I suspected this may be caused by Java garbage collection.
What changed: The systems previously ran Debian Linux with Oracle Java 11 and a giant RAID0 with 72TB. I'm not so sure the performance gains of Java 11 from Java 8 are so profound although maybe this is proof. But the GC messages in the ES logs are too infrequent to correlate.
I've used disk benchmarking tools to verify the ZFS volumes can perform to at least 300MB (I was seeing 800MB previously with Debian/RAID0). This is just to say the problem seems completely isolated to Elasticsearch and/or Java performance.
Unfortunately, for the time being it doesn't seems Java11 or Java13 will run on FreeBSD, at least without some effort. This is still a work in progress. Switching from OpenJDK8 to Oracle Java8 did show some improvement.
In the meanwhile I'm interested to know if anyone has ES or Java specific tuning ideas to resolve this. This cluster used to ingest 35k EPS and now it seems to barely handle 20k EPS. The only correlation I've found so far is the ES Index memory is fluctuating at the same interval and I'm not certain why at the moment.