Tiny dataset very high read rate, how to optimise?

We are trying to work out how to improve the throughput of our elasticSearch cluster and I think our use case is a little different from normal hence my post! Our cluster is currently 3 nodes with 8GB of memory each. Our index size is really small ~50,000 documents 200MB in total size.

We deal with quite a large number of requests to the cluster e.g. 400 requests a second but this can spike to upwards of 700 at which point the cluster tends to fall over. Our monitoring shows a search time of 5ms however the CPU simply maxes out once we approach 700 requests per second.

My questions are:

Is it unreasonable to expect this cluster to handle 700 requests per second?

What would be the best shard and replica configuration? I can fairly confidently say the current config is wrong! 5 shards and a single replica.

With that little data I would recommend a single primary shard and two replica shards. If you are not updating the data set I would also recommend forcemerging it down to a single segment.