Access filtering in tree structure


#1

I have to construct a non-trivial access filter for use with all queries on my document index. My documents and related models look like this:

DocumentIndexModel

  • int Id
  • List(int) AncestorIdPath
  • DocumentAccessModel Access

DocumentAccessModel

  • List(Group) Groups
  • List(User) Users

Group

  • int Id

User

  • int Id

The DocumentAccessModel specifies groups and users that have access to the document. Each user can be in several groups. I have implemented a simple access filter requiring a match on one of the following two TermsQueries (NEST):

new TermsQuery
{
Field = Infer.Field(d => d.Access.Groups.First().Id),
Terms = currentUserGroupIds
}
new TermQuery
{
Field = Infer.Field(d => d.Access.Users.First().Id),
Value = currentUser.Id
}

which results in the user correctly only being able to get the documents that have him or one of his groups in the acecss model.

However, and this is the tricky part: I want to expand the filter logic to also require access to all ancestors of the document. In other words, the user should only be able to find the documents that have the user himself or one of his groups in the DocumentAccessModel of the document and each of its ancestors.

To do this, I am considering replacing the DocumentAccessModel on each document with a list of DocumentAccessModels (one for each ancestor of the document), such that access requires that each of these DocumentAccessModels matches either the user directly or one of his groups.

How could such a filter be constructed - and is something like this even possible in elasticsearch?


(Zachary Tong) #2

I think you'll have to start using nested documents and nested queries to get this sort of functionality. If you just put those DocumentAccessModel's into an array, they will lose their "relational" data. E.g. all the values will become just bags of tokens with no relation between which DocumentAccessModel it came from.

But if you use nested documents, each DocAccessModel will become it's own internal "nested" document inside the root, and you can express these kinds of tree filters with nested queries.

The caveat is that ES isn't really a relational database, so this kind of functionality is there for convenience in simple situations... if you start to find yourself trying to do too many relational-style situations, you may need to start denormalizing data or re-arrange.


#3

Yeah, after reading up on the nested datatype, I agree that making the DocumentAccessModels-field nested seems to be a necessity for my proposed solution to work.

Even after doing that though, I am still not sure how to construct a filter that requires a match in each of the DocumentAccessModels in the list. Closest thing I can figure out is requiring a match in any of the DocumentAccessModels in the list.


(Zachary Tong) #4

Ah, right. So for something like that, a match in all DocAccessModels, you may need to denormalize "upwards" into the parent document

For example, you could copy all the child user ID's in a single field in the root document (using copy_to in Elasticsearch, or just do it in your application). This gives you a single, combined field with all the terms that are represented in the children

Then you can use the new Terms Set query to query the field and ensure that there is only one ID that matched (if there are more than one ID in the children documents, there will be >1 term and the TermSet query won't match the root doc).

If you're on an older version of ES, you can simulate that behavior with a script query and manually check the length of terms in a script.

Or you could index an additional field which holds the number of terms, and do a query which is a combination of must: <match term> AND must: num_terms = 1

It's a little clunky, but makes sense when you understand how ES evaluates queries. When a query is being run, it only ever has access to a single document at a time. Nested documents, which appear relational, are actually just hidden internal documents which are used to provide some extra functionality. So when nested docs are being compared, each individual doc is compared in isolation.

Which makes questions like "do all the nested docs match" hard, since a query never has the ability to look at all of the docs at the same time.

That's why denormalizing "upwards" sometimes helps, since you are putting aggregate info about the children into the root object, which can be evaluated by a query.

Hope that helps! :slight_smile:


#5

I appreciate the feedback very much, but I think you misunderstood my last comment. On each document, I plan to have a list of DocumentAccessModels (denormalized from the document's ancestors, and using the nested datatype). Having access to the document then simply requires all of the DocumentAccessModels on it to include the user (either directly or through one of his groups).

Consider for instance the document below:

"DocumentIndexModel": {
    "Id": 3,
    "AncestorIdPath": [1, 2],
    "AccessModels": [
    	{
    		"Groups": [1],
    		"Users": [1, 2]
    	},
    	{
    		"Groups": [3],
    		"Users": []
    	},
    	{
    		"Groups": [2],
    		"Users": [2, 3]
    	}
    ]
}

The document has three access models, one for each ancestor document and one for itself. In order to have access to the document, the user needs to have access to all ancestors aswell - in other words, he needs to be included in each of the access models in the list.

For instance: does the user with id 3 and groupIds 1 and 3 have access to the document above? I know that he does, since he (or at least one of his groups) is included in each element of the accessmodels list. In contrast, the user with id 4 and groupIds 1 and 2 does not have access, since he is not included in the access model in the middle. The data is readily available on the document, but I don't see how I can write a query that checks this - I don't seem to be able to with Term Set query. I might be able to with a Script Query, but due to performance considerations, I would like to avoid it if at all possible.

To state the question as precisely as possible: How can I write a query that requires a boolean expression to evaluate to true for all elements in a list on the returned documents?


(Zachary Tong) #6

Thanks for the clarification... I think I finally understand what you're looking for, sorry it took so long :slight_smile:

I believe this should be possible. Here's a reproduction to test it out, see if this works for your needs (i tried a few combinations and it seemed to work based on what you said. First, the mapping and a document:

PUT /test/
{
  "mappings": {
    "_doc": {
      "properties": {
        "Id": {
          "type": "keyword"
        },
        "AncestorIdPath": {
          "type": "keyword"
        },
        "AccessModels": {
          "type": "nested",
          "properties": {
            "Groups": {
              "type": "keyword"
            },
            "Users": {
              "type": "keyword"
            }
          }
        }
      }
    }
  }
}

POST /test/_doc/
{
    "Id": 3,
    "AncestorIdPath": [1, 2],
    "AccessModels": [
    	{
    		"Groups": [1],
    		"Users": [1, 2]
    	},
    	{
    		"Groups": [3],
    		"Users": []
    	},
    	{
    		"Groups": [2],
    		"Users": [2, 3]
    	}
    ]
}

And now the first query you specified (ID==3, groupIds==1,3):

POST /test/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "nested": {
          "path": "AccessModels",
          "query": {
            "bool": {
              "must_not": [
                {
                  "term": {
                    "AccessModels.Users": {
                      "value": 3
                    }
                  }
                },
                {
                  "terms": {
                    "AccessModels.Groups": [1, 3]
                  }
                }
              ]
            }
          }
        }
        }
      ]
    }
  }
} 

The arrangement is a little funky. The inner-most boolean must_not finds all nested documents that do not match the ID/Group criteria. The middle nested query will then match all root documents where at least one of it's children do not match the criteria. Finally, the outer-most boolean must_not will negate the set of documents where one child is missing the criteria, essentially finding the set of docs that do have all the criteria.

Here's the second query (ID=4, Groups=1,2) which doesn't match the document:

POST /test/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "nested": {
          "path": "AccessModels",
          "query": {
            "bool": {
              "must_not": [
                {
                  "term": {
                    "AccessModels.Users": {
                      "value": 4
                    }
                  }
                },
                {
                  "terms": {
                    "AccessModels.Groups": [1, 2]
                  }
                }
              ]
            }
          }
        }
        }
      ]
    }
  }
} 

Lemme know if that works for you, and thanks again for your patience and extra clarification :slight_smile:


(Zachary Tong) #7

Oh, just a note: I mapped the fields as keyword rather than long, because IDs are typically queried with single-point lookups (like above) which the keyword mapping is optimized for. Numerics are slower at single-point lookups, but much faster for range-style queries.

Feel free to adjust as necessary, the query will continue to work just fine, it'll just be slower with numerics. But if you do lots of ranges (find all users > 1000 < 2000) numerics may be a better choice.


#8

That seems to be exactly what I was looking for :slight_smile: Thanks!


(Zachary Tong) #9

Awesome! Glad we could get it working for you :slight_smile:


(system) #10

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.