No, we are thinking of using 10 nodes with 1000 shards.
We want each shard to be 75GB.
But the real question here is...
By default, I was assuming we’d have to use managed disks (using DS5v2 VM SKU) as we couldn’t rely on the local temporary storage. That could get blown up at any time. Attached managed storage seems to be the way most people talk about supporting this scenario online.
However, the Lsv2-series has both temporary storage and NVMe disk. The spec sheet for it talks about it being ideal for “Big Data, SQL, and NoSQL databases.”. Which seems to fit our problem space
Should use DS5v2 (with managed disks) or L32s (NVMe storage)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.