I know that how one deals with multi-tenancy is really based on use-case, but I just cannot seem to figure out the best approach to take, even after reading some examples.
My use case is as such.
-
Organizations
are tenants with manyusers
-
Organizations
own "datasets" -
Datasets
consist of documents containing fields of data parsed from raw files (such as text files, videos, PDFs, and many more) -
Projects
consist of manydatasets
and manyusers
have access to theseprojects
-
Datasets
can belong to multipleProjects
, not just to one
-
-
Users
within anOrganization
can have access to multipleProjects
- and thus, the
datasets
that belong to them, or evendatasets
that do not yet belong to aproject
.
- and thus, the
My thought is that I would like to be able to search specific datasets
.
- If user searches a
project
, search will take place across alldatasets
that belong to that project - If user searches a
dataset
search will only take place across that singledataset
My first instinct, of course, is to create one index per dataset
, but after reading around I know that might not be a very smart decision. Some datasets may be very large and complex, while others may be small. On top of that, there will be a lot of datasets.
My next instinct is to use routing
, but I still have the same problem: what should I use as the index? Organization
may be too large (and there may be tens of thousands of them), and datasets
may be too numerous (there may be tens of thousands per organization
).