Permission Trees

I'm trying to use ElasticSearch in an environment where I need
document-by-document permissions. We have considered using an external
store for permissions mapping, however the sheer number of documents that
will ordinarily be filtered by permissions alone seem to make this a
questionable choice for performance reasons.

So, we found NestedDocuments which at first glance seem to fit the
bill. Put the type of permissions and the groups into the NestedDocument
for each document. This seems to work well (queries are easy and fast)
until it is necessary to change a tree of documents' permissions. A
typical change would be: there are a couple of million documents in a
particular sub-tree and an administrator changes the permissions on that
tree. I can't see how to update the NestedDocument for all the documents
without actually updating each one. I was hoping that there was a way to
treat the NestedDocuments like a separate store and update all the
identical ones with a single NestedDocument update. Is there anything like
this?

Or alternatively is there another option? Fwiw, we also looked at
'parent', but since that's a single parent and we probably want that for
another purpose, it didn't seem like a fit for permissions.

mbaryu

--

Hello,

If you want to have permissions for each of your N documents, then I
think there's no getting away from having to hold N "access control
lists". Which also means updating many of those ACLs if necessary :frowning:

That said, I would try to use logical containers, if it's applicable.
Depending on how you use your data (you mentioned something about
trees but I didn't really get it), I suppose you can set ACLs on some
containters. For example, by index or type - you could store a
document somewhere where you specify which users could access
documents from a specific index/type. Then, when you'd want to change
permissions for the whole bunch, you can just update one document. And
you can also define custom containers by using a separate field(s) in
your documents. For example, documents with the same value of field
"tag" would get permissions from a document stored somewhere.

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Fri, Nov 2, 2012 at 8:42 PM, mbaryu chrisb@chibi.ca wrote:

I'm trying to use Elasticsearch in an environment where I need
document-by-document permissions. We have considered using an external
store for permissions mapping, however the sheer number of documents that
will ordinarily be filtered by permissions alone seem to make this a
questionable choice for performance reasons.

So, we found NestedDocuments which at first glance seem to fit the bill.
Put the type of permissions and the groups into the NestedDocument for each
document. This seems to work well (queries are easy and fast) until it is
necessary to change a tree of documents' permissions. A typical change
would be: there are a couple of million documents in a particular sub-tree
and an administrator changes the permissions on that tree. I can't see how
to update the NestedDocument for all the documents without actually updating
each one. I was hoping that there was a way to treat the NestedDocuments
like a separate store and update all the identical ones with a single
NestedDocument update. Is there anything like this?

Or alternatively is there another option? Fwiw, we also looked at 'parent',
but since that's a single parent and we probably want that for another
purpose, it didn't seem like a fit for permissions.

mbaryu

--

--

Thanks for the input Radu!

Ah, well I didn't explain that well enough I guess. We have permission
containers in the rdb. In Solr we're looking at using the 4.0 join syntax,
but it isn't particularly fast on read. In one Solr index we have the
permission objects and in the other we have the main body of documents. To
query we do something like (in sql-like terms): SELECT * FROM data WHERE
AND FILTER permgroup CONTAINS-IN (SELECT permgroup FROM
permindex WHERE permusers CONTAINS );. In the usual case, let's say
we've got a directory structure like /usr/local/lib/winnie and / has
{ALL-RO}, while /usr has {ALL-RO,GROUP_B-RW}. /usr/local and all children
inherit that permission and point to it (the permgroup is shared among all
inherited children). NestedDocuments does the reads on this very fast at
the cost of needing to rewrite the tree if the permissions on /usr change.

If I define a tag in Elasticsearch, let's say I put the permission-group
name in the tag so I've got {id,data,permission-group} as my index, how do
I query the data? I could list all the permission-groups a user is in, but
that seems expensive when brought from an outside source (it's also not
cheap in the inner-select shown above, but at least it's local). Other
thoughts?

mbaryu

On Saturday, November 3, 2012 8:03:11 AM UTC-7, Radu Gheorghe wrote:

Hello,

If you want to have permissions for each of your N documents, then I
think there's no getting away from having to hold N "access control
lists". Which also means updating many of those ACLs if necessary :frowning:

That said, I would try to use logical containers, if it's applicable.
Depending on how you use your data (you mentioned something about
trees but I didn't really get it), I suppose you can set ACLs on some
containters. For example, by index or type - you could store a
document somewhere where you specify which users could access
documents from a specific index/type. Then, when you'd want to change
permissions for the whole bunch, you can just update one document. And
you can also define custom containers by using a separate field(s) in
your documents. For example, documents with the same value of field
"tag" would get permissions from a document stored somewhere.

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

On Fri, Nov 2, 2012 at 8:42 PM, mbaryu <chr...@chibi.ca <javascript:>>
wrote:

I'm trying to use Elasticsearch in an environment where I need
document-by-document permissions. We have considered using an external
store for permissions mapping, however the sheer number of documents
that
will ordinarily be filtered by permissions alone seem to make this a
questionable choice for performance reasons.

So, we found NestedDocuments which at first glance seem to fit the bill.
Put the type of permissions and the groups into the NestedDocument for
each
document. This seems to work well (queries are easy and fast) until it
is
necessary to change a tree of documents' permissions. A typical change
would be: there are a couple of million documents in a particular
sub-tree
and an administrator changes the permissions on that tree. I can't see
how
to update the NestedDocument for all the documents without actually
updating
each one. I was hoping that there was a way to treat the
NestedDocuments
like a separate store and update all the identical ones with a single
NestedDocument update. Is there anything like this?

Or alternatively is there another option? Fwiw, we also looked at
'parent',
but since that's a single parent and we probably want that for another
purpose, it didn't seem like a fit for permissions.

mbaryu

--

--

Hello,

On Sat, Nov 3, 2012 at 10:52 PM, mbaryu chrisb@chibi.ca wrote:

Thanks for the input Radu!

Ah, well I didn't explain that well enough I guess. We have permission
containers in the rdb. In Solr we're looking at using the 4.0 join syntax,
but it isn't particularly fast on read. In one Solr index we have the
permission objects and in the other we have the main body of documents. To
query we do something like (in sql-like terms): SELECT * FROM data WHERE
AND FILTER permgroup CONTAINS-IN (SELECT permgroup FROM
permindex WHERE permusers CONTAINS );. In the usual case, let's say
we've got a directory structure like /usr/local/lib/winnie and / has
{ALL-RO}, while /usr has {ALL-RO,GROUP_B-RW}. /usr/local and all children
inherit that permission and point to it (the permgroup is shared among all
inherited children). NestedDocuments does the reads on this very fast at
the cost of needing to rewrite the tree if the permissions on /usr change.

If I define a tag in Elasticsearch, let's say I put the permission-group
name in the tag so I've got {id,data,permission-group} as my index, how do I
query the data? I could list all the permission-groups a user is in, but
that seems expensive when brought from an outside source (it's also not
cheap in the inner-select shown above, but at least it's local). Other
thoughts?

I think I understand the situation better now. And I don't have any
other thoughts - of all the options I have in mind, the nested
document thing (which you already have) seems the best thing. But I'll
let you know if a better option pops up in my head :slight_smile:

Best regards,
Radu

http://sematext.com/ -- Elasticsearch -- Solr -- Lucene

--

Hi,

Just to make sure the obvious is not missed - have you looked at ManifoldCF
to see if that has what you need in terms of ACLs?
It can output docs to both Solr and ES.

Otis

Search Analytics - Cloud Monitoring Tools & Services | Sematext
Performance Monitoring - Sematext Monitoring | Infrastructure Monitoring Service

On Friday, November 2, 2012 2:42:44 PM UTC-4, mbaryu wrote:

I'm trying to use Elasticsearch in an environment where I need
document-by-document permissions. We have considered using an external
store for permissions mapping, however the sheer number of documents that
will ordinarily be filtered by permissions alone seem to make this a
questionable choice for performance reasons.

So, we found NestedDocuments which at first glance seem to fit the
bill. Put the type of permissions and the groups into the NestedDocument
for each document. This seems to work well (queries are easy and fast)
until it is necessary to change a tree of documents' permissions. A
typical change would be: there are a couple of million documents in a
particular sub-tree and an administrator changes the permissions on that
tree. I can't see how to update the NestedDocument for all the documents
without actually updating each one. I was hoping that there was a way to
treat the NestedDocuments like a separate store and update all the
identical ones with a single NestedDocument update. Is there anything like
this?

Or alternatively is there another option? Fwiw, we also looked at
'parent', but since that's a single parent and we probably want that for
another purpose, it didn't seem like a fit for permissions.

mbaryu

--

On 11/2/2012 11:42 AM, mbaryu wrote:

I'm trying to use Elasticsearch in an environment where I need
document-by-document permissions. We have considered using an
external store for permissions mapping, however the sheer number of
documents that will ordinarily be filtered by permissions alone seem
to make this a questionable choice for performance reasons.

So, we found NestedDocuments which at first glance seem to fit the
bill. Put the type of permissions and the groups into the
NestedDocument for each document. This seems to work well (queries
are easy and fast) until it is necessary to change a tree of
documents' permissions. A typical change would be: there are a couple
of million documents in a particular sub-tree and an administrator
changes the permissions on that tree. I can't see how to update the
NestedDocument for all the documents without actually updating each
one. I was hoping that there was a way to treat the NestedDocuments
like a separate store and update all the identical ones with a single
NestedDocument update. Is there anything like this?

Or alternatively is there another option? Fwiw, we also looked at
'parent', but since that's a single parent and we probably want that
for another purpose, it didn't seem like a fit for permissions.

This thread is from a couple of weeks ago, but I thought I'd make a
suggestion any way.

I can't suggest a way to not update each documents permissions when then
change, but I can suggest a way to ONLY update the permission
information. Instead of storing the permission structure in NESTED
object, store it in a CHILD object. That way you can get, delete and
insert the new permission information separate from the document itself.
It sounds like that would work in your situation. It requires
addressing each child document, but it doesn't require you to touch the
parent object.
As to changing all with one magic command, I don't there is such a
thing. To me an index is about finding documents in indexes not about
managing all of the data in the index.

-Paul

--

Thanks to both P and Otis for suggestions!

I took a quick look at ManifoldCF but I haven't had enough time to really
look into it properly yet. Is there a quick-start guide?

As for nested vs parents, we are now using a parent + nested system where
container objects have permissions and leaves have parents. Seems to work
well enough and balances the amount of tree update with speed and also not
causing too much data to be on a single shard. This is also suited to our
directory-like structure - files are leaves and folders are containers.

Chris...

On Tuesday, November 13, 2012 3:20:14 PM UTC-8, P Hill wrote:

On 11/2/2012 11:42 AM, mbaryu wrote:

I'm trying to use Elasticsearch in an environment where I need
document-by-document permissions. We have considered using an
external store for permissions mapping, however the sheer number of
documents that will ordinarily be filtered by permissions alone seem
to make this a questionable choice for performance reasons.

So, we found NestedDocuments which at first glance seem to fit the
bill. Put the type of permissions and the groups into the
NestedDocument for each document. This seems to work well (queries
are easy and fast) until it is necessary to change a tree of
documents' permissions. A typical change would be: there are a couple
of million documents in a particular sub-tree and an administrator
changes the permissions on that tree. I can't see how to update the
NestedDocument for all the documents without actually updating each
one. I was hoping that there was a way to treat the NestedDocuments
like a separate store and update all the identical ones with a single
NestedDocument update. Is there anything like this?

Or alternatively is there another option? Fwiw, we also looked at
'parent', but since that's a single parent and we probably want that
for another purpose, it didn't seem like a fit for permissions.

This thread is from a couple of weeks ago, but I thought I'd make a
suggestion any way.

I can't suggest a way to not update each documents permissions when then
change, but I can suggest a way to ONLY update the permission
information. Instead of storing the permission structure in NESTED
object, store it in a CHILD object. That way you can get, delete and
insert the new permission information separate from the document itself.
It sounds like that would work in your situation. It requires
addressing each child document, but it doesn't require you to touch the
parent object.
As to changing all with one magic command, I don't there is such a
thing. To me an index is about finding documents in indexes not about
managing all of the data in the index.

-Paul

--