I have a question that I couldn't really find an answer to:
When using searchable snapshots, is the data stored and read in a "sequential" order, in a way that would be ideal for using HDDs with, or is the "recommendation" to use SSDs/NVMe for searchable snapshot nodes.
For context, the nodes would be AWS EC2 instances mostly of the "storage optimized" class.
If there are benchmarks somewhere of a comparison of these 2 storage mediums somewhere as well, that would be interesting.
The data on disk on the node itself is laid out pretty much just like any other Lucene data, with the usual expectations about access patterns: multiple concurrent readers, each of which will often only read forwards but may skip rather than doing purely sequential reads, and others will do more of a random pattern. The data is retrieved from snapshot in a fairly chunky sequential fashion, but again with multiple concurrent writers that probably won't play nicely with spinning disks.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.