Is 3k search/sec high volumn? (High CPU usage)

Depending on the version this doesn't kick in properly after the index is created. I don't have a link.

What you describe is fairly normal for when the cluster is at the edge.

Your index is fairly small so I'm not surprised I don't see IO load.

The hot_threads isn't doing well. It doesn't do a good job when you have many short running jobs. Your best bet is to use jstack on a node several times in a row while its under load and analyze that.

You'll have to post example search queries for us to help with those. Depending on what you are doing match all might not be a great indicator. Like if fetching from _source is taking a while then match_all isn't going to change anything. Really the stack traces are you best bet for figuring out what is up.

Another thing to check is jstat gcutil <pid> 3s 100. You can use that to figure out how much time is being taken up by gc. Its harder to figure out what is taking up the memory, but with the queries you could probably puzzle it out.