I'm building a solution that controls access to documents by library. Users gain access to libraries through subscriptions. We want our users to run searches across all of the libraries that they have access to, but not be able to see libraries that they don't have access to. The documents all have the same mappings.
We are expecting to have hundreds or possibly thousands of libraries eventually, with millions of documents, but initially less than 100 libraries and less than 100k documents. Most users will have access to less than 100 libraries.
I'm trying to evaluate if the best solution would be to create a separate index for each library.
I have considered mirroring users and their access rights in Elasticsearch so they can access using a token, but I think this adds unnecessary overhead and we don't want users logging into Kibana.
Option 1 - Single Index
We could put everything into a single index and control access to each library through our application by using filters in the query. This would work, but means that operations such as deleting a library and boosting certain fields would be harder to do on a library by library basis.
Option 2 - Index per library
Instead, we could create an index per library. This would allow us to change search weighting by library (I think) and would allow us to manage library CRUD operations more easily.
To manage user access to documents within a search query, I was thinking of a solution that aliases the indexes as follows:
[SubscriptionName]-[Library-Name]-[Version]
That way, If a user has a specific subscription, we would be able to run a query for all indexes under a subscription as follows:
GET /Subscription1-*/_search
To get a specific library (though any subscription) we could search as follows:
GET /*-Library1*/_search
And to search a list of libraries we could search as follows:
GET /*-Library1*,*-Library2*,*-Library1*/_search
I think that means that on our backend app, we just need to apply the correct indexes to the search to search against all of the documents that user has access to.
So my questions are:
- Are there limitations to the number of indexes you can search across in a single search query?
- Does using wildcard searches change these limitations at all?
- Is it possible to set different boosts for different indexes and return results for all in a single query?
- Do you have any recommendations on which option is best (or an Option 3 that is better)?