Multi-Tenancy and Indexes / Aliases / Routing

I know that how one deals with multi-tenancy is really based on use-case, but I just cannot seem to figure out the best approach to take, even after reading some examples.

My use case is as such.

  • Organizations are tenants with many users
  • Organizations own "datasets"
  • Datasets consist of documents containing fields of data parsed from raw files (such as text files, videos, PDFs, and many more)
  • Projects consist of many datasets and many users have access to these projects
    • Datasets can belong to multiple Projects, not just to one
  • Users within an Organization can have access to multiple Projects
    • and thus, the datasets that belong to them, or even datasets that do not yet belong to a project.

My thought is that I would like to be able to search specific datasets.

  • If user searches a project, search will take place across all datasets that belong to that project
  • If user searches a dataset search will only take place across that single dataset

My first instinct, of course, is to create one index per dataset, but after reading around I know that might not be a very smart decision. Some datasets may be very large and complex, while others may be small. On top of that, there will be a lot of datasets.

My next instinct is to use routing, but I still have the same problem: what should I use as the index? Organization may be too large (and there may be tens of thousands of them), and datasets may be too numerous (there may be tens of thousands per organization).