Query-time per-document authorization

Hello,

if one were to integrate elasticsearch with an external access management
service that authorized users on a "per view" basis, how should one
approach the issue? Let's say that any form of index-side caching of the
authorization information is out of question. Every result set needs to be
filtered by querying the external access management service. Although
surely imparting a hefty performance penalty, in Solr I can imagine solving
this by a custom PostFilter. Is there an equivalent functionality in
elasticsearch? How could the problem be addressed?

Thanks,
Peter

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Peter,

There are some common use cases where a post filter could have trouble -
i.e.show top N documents that match a given query. A filter might
(correctly) take all of the results out. You could over fetch but if the
%age of docs visible to the user is fairly small you could still miss.
You'd also have to decide what to do with aggregates / facets. Can I count
docs that I can't see?

For one project we pushed hashes of the groups and users into arrays. It
worked but only because the permissions did not change often and user level
authorizations were rare. Not sure what we'd have done if they had changed
more - we feared heavy reindex costs in that scenario.

Doubt that helped with a solution but maybe it helped with what's not a
solution.

--Mike

On Fri, May 24, 2013 at 11:14 AM, Peter Galiovský galiovsky.ed@gmail.comwrote:

Hello,

if one were to integrate elasticsearch with an external access management
service that authorized users on a "per view" basis, how should one
approach the issue? Let's say that any form of index-side caching of the
authorization information is out of question. Every result set needs to be
filtered by querying the external access management service. Although
surely imparting a hefty performance penalty, in Solr I can imagine solving
this by a custom PostFilter. Is there an equivalent functionality in
elasticsearch? How could the problem be addressed?

Thanks,
Peter

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

If I understand correctly, you want to restrict people to see only the
documents they're allowed to see? First, as Michael writes, filtering the
returned results might severely impact the usability/experience for users
(no results etc.

I think the solution depeneds on how you embed the information into the
documents.

For instance, in the “each user must see only ‘their’ documents”, you would
simply add a user_id field in the document, and filter on this field,
preferably with a filtered query. For the “only people in ‘sales’
department can see these documents”, you'd use a similar approach,
embedding the department names/codes in the document; when the user
performs a search, you probably have information about departments they're
part of, and update the query accordingly.

If by “any form of index-side caching of the authorization information is
out of question” means that you want to filter the results in 100%
realtime, then I'm afraid your only solution is to perform a query, get
results, filter them, look if you've got enough or not, if not, repeat the
process. I have a bit of a hard time picturing this requirement being
accepted as reasonable.

Karel

On Friday, May 24, 2013 5:14:42 PM UTC+2, Peter Galiovský wrote:

Hello,

if one were to integrate elasticsearch with an external access management
service that authorized users on a "per view" basis, how should one
approach the issue? Let's say that any form of index-side caching of the
authorization information is out of question. Every result set needs to be
filtered by querying the external access management service. Although
surely imparting a hefty performance penalty, in Solr I can imagine solving
this by a custom PostFilter. Is there an equivalent functionality in
elasticsearch? How could the problem be addressed?

Thanks,
Peter

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Michael, Karel, thank you both for your ideas! I had similar thoughts on
this issue. If at all possible, I'll store the security information in the
index. I just want to be prepared for the occasion that this won't be
possible. In that case, most likely a list of "authorized" roles would be
stored with each document in the index. At query time, for each possible
search result, I would have to ask the external module: "Does this user
have any of the 'authorized' roles on this document?"

As Karel mentions, doing this "post-search" brings a lot of usability
issues. That's why Solr's PostFilter looks appealing. The name is actually
slightly misleading, it's not a "post-search" filter of the kind mentioned.
Instead, it is described as "a mechanism to further filter documents after
they have already gone through the main query and other filters. This is
appropriate for filters with a very high cost."
(PostFilter (Solr 4.3.0 API))
As it's still done "in the search engine", facets, pagination/limit/offset
etc. should work as usual.

Perhaps my question then really is: What's the proper way of implementing a
custom non-caching filter for elasticsearch? And how to use it in a query
such that it is evaluated last?

Peter

Dňa sobota, 25. mája 2013 9:11:37 UTC+2 Karel Minařík napísal(-a):

If I understand correctly, you want to restrict people to see only the
documents they're allowed to see? First, as Michael writes, filtering the
returned results might severely impact the usability/experience for users
(no results etc.

I think the solution depeneds on how you embed the information into the
documents.

For instance, in the “each user must see only ‘their’ documents”, you
would simply add a user_id field in the document, and filter on this
field, preferably with a filtered query. For the “only people in ‘sales’
department can see these documents”, you'd use a similar approach,
embedding the department names/codes in the document; when the user
performs a search, you probably have information about departments they're
part of, and update the query accordingly.

If by “any form of index-side caching of the authorization information is
out of question” means that you want to filter the results in 100%
realtime, then I'm afraid your only solution is to perform a query, get
results, filter them, look if you've got enough or not, if not, repeat the
process. I have a bit of a hard time picturing this requirement being
accepted as reasonable.

Karel

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Peter,

Sorry that I didn't read up on the post filter before responding. I am
confused how that works with your #1 constraint of no security tokens on
the server side. If it's being executed on the server it must have some
security information for comparison. Am I missing something?

--Mike

On Mon, May 27, 2013 at 12:01 PM, Peter Galiovský galiovsky.ed@gmail.comwrote:

Michael, Karel, thank you both for your ideas! I had similar thoughts on
this issue. If at all possible, I'll store the security information in the
index. I just want to be prepared for the occasion that this won't be
possible. In that case, most likely a list of "authorized" roles would be
stored with each document in the index. At query time, for each possible
search result, I would have to ask the external module: "Does this user
have any of the 'authorized' roles on this document?"

As Karel mentions, doing this "post-search" brings a lot of usability
issues. That's why Solr's PostFilter looks appealing. The name is actually
slightly misleading, it's not a "post-search" filter of the kind mentioned.
Instead, it is described as "a mechanism to further filter documents after
they have already gone through the main query and other filters. This is
appropriate for filters with a very high cost." (
PostFilter (Solr 4.3.0 API))
As it's still done "in the search engine", facets, pagination/limit/offset
etc. should work as usual.

Perhaps my question then really is: What's the proper way of implementing
a custom non-caching filter for elasticsearch? And how to use it in a query
such that it is evaluated last?

Peter

Dňa sobota, 25. mája 2013 9:11:37 UTC+2 Karel Minařík napísal(-a):

If I understand correctly, you want to restrict people to see only the
documents they're allowed to see? First, as Michael writes, filtering the
returned results might severely impact the usability/experience for users
(no results etc.

I think the solution depeneds on how you embed the information into the
documents.

For instance, in the “each user must see only ‘their’ documents”, you
would simply add a user_id field in the document, and filter on this
field, preferably with a filtered query. For the “only people in ‘sales’
department can see these documents”, you'd use a similar approach,
embedding the department names/codes in the document; when the user
performs a search, you probably have information about departments they're
part of, and update the query accordingly.

If by “any form of index-side caching of the authorization information is
out of question” means that you want to filter the results in 100%
realtime, then I'm afraid your only solution is to perform a query, get
results, filter them, look if you've got enough or not, if not, repeat the
process. I have a bit of a hard time picturing this requirement being
accepted as reasonable.

Karel

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Mike,

my apologies for not being clear enough about what I want to achieve.
Perhaps I'm a bit naive, but I was thinking about making a remote call
(let's say using some low overhead web service) to the external security
module from the custom filter class. I know this sounds horribly scary from
the performance perspective. I just need a backup plan if storing all the
necessary authorization info in the index won't be possible.

Peter

Dňa pondelok, 27. mája 2013 19:07:01 UTC+2 Michael Sick napísal(-a):

Hi Peter,

Sorry that I didn't read up on the post filter before responding. I am
confused how that works with your #1 constraint of no security tokens on
the server side. If it's being executed on the server it must have some
security information for comparison. Am I missing something?

--Mike

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Hi Peter,

may be you should look at ManifolCF
http://manifoldcf.apache.org/en_US/index.html
Af far as I know they are implementing security model for search engines
(including Solr and Elasticsearch). Though I haven't been using it myself.

Regards,
Lukas

On Mon, May 27, 2013 at 7:39 PM, Peter Galiovský galiovsky.ed@gmail.comwrote:

Hi Mike,

my apologies for not being clear enough about what I want to achieve.
Perhaps I'm a bit naive, but I was thinking about making a remote call
(let's say using some low overhead web service) to the external security
module from the custom filter class. I know this sounds horribly scary from
the performance perspective. I just need a backup plan if storing all the
necessary authorization info in the index won't be possible.

Peter

Dňa pondelok, 27. mája 2013 19:07:01 UTC+2 Michael Sick napísal(-a):

Hi Peter,

Sorry that I didn't read up on the post filter before responding. I am
confused how that works with your #1 constraint of no security tokens on
the server side. If it's being executed on the server it must have some
security information for comparison. Am I missing something?

--Mike

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Maybe this is interesting
https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/tavroa3Nw5g

Am Dienstag, 28. Mai 2013 09:28:45 UTC+2 schrieb Lukáš Vlček:

Hi Peter,

may be you should look at ManifolCF
Welcome to Apache ManifoldCF™!
Af far as I know they are implementing security model for search engines
(including Solr and Elasticsearch). Though I haven't been using it myself.

Regards,
Lukas

On Mon, May 27, 2013 at 7:39 PM, Peter Galiovský <galiov...@gmail.com<javascript:>

wrote:

Hi Mike,

my apologies for not being clear enough about what I want to achieve.
Perhaps I'm a bit naive, but I was thinking about making a remote call
(let's say using some low overhead web service) to the external security
module from the custom filter class. I know this sounds horribly scary from
the performance perspective. I just need a backup plan if storing all the
necessary authorization info in the index won't be possible.

Peter

Dňa pondelok, 27. mája 2013 19:07:01 UTC+2 Michael Sick napísal(-a):

Hi Peter,

Sorry that I didn't read up on the post filter before responding. I am
confused how that works with your #1 constraint of no security tokens on
the server side. If it's being executed on the server it must have some
security information for comparison. Am I missing something?

--Mike

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.