Real world elastic sizing

I have been using a single vm with the whole elk stack on one machine, 6 cores and 20Gb of ram and all is well.

I am looking to scale up the amount of data we store however the actual rate of messages will be static, just looking to keep months of data instead of days.

On this basis I believe the bottleneck is not going to be with logstash (Messages per second is not going up) but with elasticsearch.

I can scale vertically and just give the instance more CPU, disk IO/space and memory, (simple changes to do) or I could look at scaling out into multiple VMs - would give some redundancy but also wastes more disk space - probably dont want to do this unless I have to as we have no need for HA, messages can que during reboots or problems. Does anyone have any real world examples of large elastic instances? How big is too big?

Horizontal scaling isn't just for improving availability. Shards will be distributed evenly among the cluster's data nodes regardless of whether you have replicas.

By the nature of Elasticsearch, it's recommended to scale horizontally. Especially if you want to leverage the replication feature. As previously stated, in most (almost all) cases multiple nodes are the way to go for a production deployment.

Sure, the issue I have is that any additional VMs will almost certainly be sharing the same hardware and physical disks. Is it still going to be best to scale that way? How horizontal or vertical should I go?

e.g. Lets imagine I have 80GB of ram, 24 cores and 10TB of disk to play with.

So I could go Huge:

1 VM with 24 cores, 80GB of ram and 10TB of disk

Big:

3 VMs with 8 cores, 25Gb of ram and 3.3TB of disk.

Medium:

10 Vms with 2 cores each, 8Gb of ram and 1TB of disk

or Micro:

20 Vms with 1 core each, 4GB of ram and 500Gb disk

Again, they will likely be sharing the same hardware.

If all nodes would end up on the same hardware anyway, it probably makes sense to use as few nodes as possible. If the hardware also needs to host Logstash, a single Elasticsearch node with around 64GB of RAM and 30GB heap may be the way to go.