Thanks for the reply, I was not aware of the async search feature (for something similar, have used partitions to break up aggregations and retrieve results via the API for batch reporting, but unfortunately it doesn't seem doable via Kibana visualizations).
That's an interesting point about the CPU caches, I hadn't considered that aspect. Will have a think about establishing some baseline performance metrics and test the effects of increasing the heap on a single node before jumping into anything.
Thanks!