Is the join datatype appropriate for this scenario?


(Zeigernz) #1

Hi,
I am trying to model the following use case in elasticsearch and I wonder what you think about my use of the join datatype.

There are many users in my system, let's say they are: UserA, UserB, UserC. They all belong to an organisation called Org1.

Users can create folders and these folders contain lots and lots of documents (in millions) that are indexed in elasticsearch. For reasons I won't go into now, each folder has its own dedicated index. Documents within that folder are indexed in the folders index.

This is what the index names look like:
Org1-Folder1
Org1-Folder2
Org1-Folder3

Now comes the fun part.

Folders can be shared between users. So UserA can share a folder they created with UserB.

Now as UserB, I want a "global search" that allows me to search:

  • through a specific folder (i.e. query over the index Org1-Folder1)
  • through every folder I have access to (i.e. query using the wildcard index name Org1-* somehow that searches through all folders I have access to)

One way to do this could be to index 1 extra document in every index, containing an array of users that "this" folder has access to. This is the "parent". This index also contains ALL documents that belong to the folder as well. Each one of this is a "child".

i.e. Index Org1-Folder1 contains:

  • one document that contains a parent field with information like: {"users": ["UserA", "UserB"]}
  • millions of other documents that belong to this parent

The "global search" for a user is then is a query using the wildcard index name: Org1-* for all documents whose parents have the matching user. This would mean users can easily search for everything they have access to. And when "sharing" changes (i.e. a folder is shared with another user), only 1 document in that folders index needs to change (rather than millions of documents).

Do you think this approach will work, with thousands of folders, each containing potentially millions of documents? What are the potential limitations of doing this?

I really want to avoid having to modify millions of documents every time their folders sharing changes.

Thank you.


(system) #2

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.