Is 3k search/sec high volumn? (High CPU usage)

Hi
I have Elasticsearch 8 nodes cluster of 6 to 8 CPUs with 16GB memory and SSDs in them.
I am doing search query on the index which has 2 million documents in 8 shard + 1 replica. Document size is moderate so index size is around 2GB.
Current traffic makes 3000 search query per second, and overall CPU usage is around 50%. And, if the query rate goes up over 4000/s then some nodes reach 100% and start dropping queues which causes application failure.
There's no indexing during the period.
Each query takes less than 50ms. I tried to optimize search query, but simple match all query also takes almost half of current usage, which is still too high.
One interesting thing is that if I optimize index with max_num_segments=1 then CPU usage goes down to a half. So I reduced segments_per_tier to 3 but it didn't help.
Is this normal capacity of elasticsearch? Or is there something wrong with my cluster.
I used both Oracle and OpenJDK, and result is similar on both.
This is hot thread dump.

Depending on the version this doesn't kick in properly after the index is created. I don't have a link.

What you describe is fairly normal for when the cluster is at the edge.

Your index is fairly small so I'm not surprised I don't see IO load.

The hot_threads isn't doing well. It doesn't do a good job when you have many short running jobs. Your best bet is to use jstack on a node several times in a row while its under load and analyze that.

You'll have to post example search queries for us to help with those. Depending on what you are doing match all might not be a great indicator. Like if fetching from _source is taking a while then match_all isn't going to change anything. Really the stack traces are you best bet for figuring out what is up.

Another thing to check is jstat gcutil <pid> 3s 100. You can use that to figure out how much time is being taken up by gc. Its harder to figure out what is taking up the memory, but with the queries you could probably puzzle it out.