Request: ecommerce-search-catalog-embedding-128 dataset used in ES 9.3 vs OS 3.5 vector search benchmark

Hi everyone,

I'm looking for the dataset used in the competitive benchmarking study:
"Elasticsearch 9.3 vs OpenSearch 3.5: Vector Search Performance"

Blog post: OpenSearch vs. Elasticsearch: Throughput for filtered vector search - Elasticsearch Labs
GitHub repo: competitive-benchmarking-studies/es-9.3-vs-os-3.5-vector-search at main · elastic/competitive-benchmarking-studies · GitHub

The dataset is referenced in the repo as ecommerce-search-catalog-embedding-128:

  • 20 million documents
  • 128-dimensional dense vector embeddings (cosine similarity)
  • Structured metadata fields for filtered vector search (e.g., item validity, availability)
  • Pre-computed ground truth for recall evaluation
  • Two Parquet files: data.parquet and queries.parquet

However, the datasets/ directory is excluded from the repository via .gitignore (commit message: "do not commit datasets"), and the blog post does not provide a download link or source reference.

I'd like to use this dataset to:

  1. Reproduce the benchmark results on our own infrastructure
  2. Extend the comparison to ohter vector index implementation (ES 9.3.x)

Would it be possible to share this dataset, or point me to an equivalent publicly available dataset that matches this schema? Alternatively, is there a script or tool to generate it?
Any help would be greatly appreciated. Thanks!