I know that how one deals with multi-tenancy is really based on use-case, but I just cannot seem to figure out the best approach to take, even after reading some examples.
My use case is as such.
-
Organizationsare tenants with manyusers -
Organizationsown "datasets" -
Datasetsconsist of documents containing fields of data parsed from raw files (such as text files, videos, PDFs, and many more) -
Projectsconsist of manydatasetsand manyusershave access to theseprojects-
Datasetscan belong to multipleProjects, not just to one
-
-
Userswithin anOrganizationcan have access to multipleProjects- and thus, the
datasetsthat belong to them, or evendatasetsthat do not yet belong to aproject.
- and thus, the
My thought is that I would like to be able to search specific datasets.
- If user searches a
project, search will take place across alldatasetsthat belong to that project - If user searches a
datasetsearch will only take place across that singledataset
My first instinct, of course, is to create one index per dataset, but after reading around I know that might not be a very smart decision. Some datasets may be very large and complex, while others may be small. On top of that, there will be a lot of datasets.
My next instinct is to use routing, but I still have the same problem: what should I use as the index? Organization may be too large (and there may be tens of thousands of them), and datasets may be too numerous (there may be tens of thousands per organization).