Thoughts on an approach of ACL filtering on content


(Swaminathan Rajamohan) #1

Hi All,

Following is my content model.

Document(s) are associated with user & group acls defining the principals who have access to the document.
The document itself is a bunch of metadata & a large content body (extracted from pdfs/docs etc).

The user performing the search has to be limited to only the set of documents he/she is entitled to (as defined by the acls on the document). He/She could have access to the document owing to user acls or owing to the group the user belongs to.
Both group membership and acls on the document are highly transient in nature meaning a user's group membership changes quite often so are the ACLs on the document itself.

Approach 1
Store the acls on the document along with its metadata as a non-stored field. Expand the groups in the ACL to the individual users (since the acl can be a group).
At the time of query, append a filter to the user query which will do a bool filter to include only documents with the userid in the acl field

"filter" : {
        "query" : {
            "term": {
                "acls": "1234"
            }
        }
      }

The problem i see with this approach is that documents need to get re-indexed though the document metadata/content is not changed.

Every time a user's group membership changes
Every time the ACL on the document changes (permission changed for the document)

I am assuming that this will lead to a large number of segment creation and merges and especially since the document body (one of the fields of the document) is a pretty large text section.

Approach 2:
This is a modification on the approach 1. This approach attempts to limit the updates on the document when the updates are strictly acl related.

Instead of having the acls defined on the metadata. This approach involves creating multiple types

In the Document Index

Document (with metadata & text body) as a parent`

id
text

userschild Document (parent id & user acls only). This document will exist for each parent

id
parentid
useracls

groupschild Document (parent id & group acls only). This document will exist for each parent with group acls

id
parentid
groupacls

In the Users Index
An entry for each user in the system with the groups he/she is associated with

User
   id
   groups

The idea here is that updates are now localized to the different ElasticSearch entities.
In case of user acl changes only the userschild document will get updated (avoiding a potentially costly update on the parent document).
In case of the group acl changes only the groupschild document will get updated (again avoiding a potentially costly update on the parent document).
In case of user group membership changes again only the secondary index will get updated (avoiding the update on the parent document).

The query itself will look as follows.

"filter" : {
        "query" : {
           "bool": {
             "should": [
               {
                  "has_child": {
                    "type": "userschild",
                    "query": {
                      "term": {
                        "users": "1234"
                      }
                    }
                  }
                },{
                  "has_child": {
                    "type": "groupschild",
                    "query": {
                    "terms" : {
                      "groups" : {
                        "index" : "users",
                        "type" : "user",
                        "id" : "1234",
                        "path" : "groups"
                      }
                    }
                  }
                  }
                }
             ]
           }
        }
      }

I have doubts with regards to its scalability owing to the nature of the query that will be involved. It involves two terms query one of which that has to be built from a separate index. I am considering improving the terms lookup using fields with docvalues enabled.

Will the approach 2 scale? The concerns I have are around the has_child query and its scalability.

Could someone clarify my understanding in this regard?


(system) #2