As said, security/ACL is a wide subject.
In my systems I have Java App servers in front of the ES search which
take the burden of session management and presentation layer, which
includes access permissions. Because I don't have a concept of document
ownership, I can let the users choose on what index they can search on.
In the app configuration I can assign what ES indexes are available to
which users.
If I understand correctly, your question boils down to the general
aspect of document modeling.
-
there are users and documents, each document is tagged with a code for
a varying number of users, and documens are stored (in a primary store)
and indexed (in a secondary store, the search engine)
-
there are different codes, codes that are used for searching
documents, and maybe there are more codes for retrieving document content
-
when updating users and documents, all codes must be correctly
assigned immediately (because it's security, it's critical to maintain
the document access correctly under all kinds of circumstances)
The last observation is the hardest part, it mostly leads to the
decision not to index access codes, but keep them in a single place
where updates can be performed in an isolated, transactional enviornment.
If you want to move the access permissions into the search, you have to
carefully design the documents you want to index.
One of the most used basic principle of indexing documents is known as
denormalization.
Instead of normalizing data, as you would do for a relational database,
you do it the other way round, you select the keys and assign it to each
and every document they belong to (similar to a select query in a
relational database).
Example:
"user": [ "u1", "u2" ]
"permission" : [ "p1", "p2" ]
"documents" : [ "d1"]
access relation table for d1:
u1 -> p1
u2 -> p1
u2 -> p2
=>
{
"_id" : "d1",
"permissions" : [
{ "u1" : [ "p1" ] },
{ "u2" : ["p1", "p2"] }
],
"content" : {
....
}
}
This concept of the permissions object can be simplified if you have a
concept of "document ownership", that is, a fixed set of users and
permissions. E.g. user u1 and u2 and read/write. Then you can
denormalize documents into the index/type model like this:
{
"_index" : "u1",
"_type" : "read",
"_id" : "d1",
"content" : {
...
}
}
{
"_index" : "u2",
"_type" : "read",
"_id" : "d1",
"content" : {
...
}
}
{
"_index" : "u2",
"_type" : "write",
"_id" : "d1",
"content" : {
...
}
}
Note the _id, "d1" is indexed three times, for each user and permission
code. This is the overhead of denormalization. Searches should be
targeted to ES so that at most one document returns.
In summary, you have several options:
-
ignore the permission layer for documents in Elasticsearch completely,
and check for access in the (transactional) primary storage before
search is performed
-
transform the permissions to an Elasticsearch index/type model (only
recommendable with the concept of document ownership and small sets of
possible permissions), that is, address the documents by users and
permission codes on the index/type level
-
create a map of the whole permission layer to tags that is suitable
for Elasticsearch filtering, assign these tags to the documents at index
time for later filtering relevant documents. The tags in the permission
map should be designed as simple and short as possible, and every
document is indexed only once.
-
as there can be no inheritance in Elasticsearch documents, you can put
your tags in JSON objects/arrays, e.g. permission "p1" and permission
"p2" -> "permission" : [ "p1", "p2" ] suitable for a boolean "and" filter
If you must reassign permissions often, you must ensure to reindex the
documents and their permissions often, which puts extra workload on the
system. Run your tests carefully to check out if your requirements fit
into these kind of conditions.
Jörg
Am 28.02.13 09:08, schrieb Ümit Seren:
My question is conceptually rather than technically.
Let's assume following:
- I have a PostgreSQL database that contains the data of my
web-application.
- I use the JDBC River to sync selected documents between the
PostgreSQL database and my elasticsearch instance.
- I have a Spring based web-application that uses Spring Security ACL
as a permission system (READ,WRITE,CREATE;DELETE).
- I have permissions/ACL on only two documents (i.e. Document A,
Document B).
- Document B can inherit permissions from Document A (default)
The service methods in my Spring application that require
authorizations are annotated with appropriate checks (they use AOP to
check permissions). That works fine.
When a user searches, it doesn't run the search directly against
elasticsearch but the search term is sent to the web-application which
which creates the search query and sends it to elasticsearch.
The hits are then analyzed by the backend and populated with some
additional data from the PostgreSQL database and sent back to the user.
Now I want to add also the same security checks when I want to search
for either Document A and B.
Here are some points I am concerned.
1.) Is it actually a "good" approach to use the web-application as a
proxy for my search queries (do other people do that?)
2.) Should I add the ACL's to my documents in elasticsearch and filter
them based on that OR should I only store ACLs in my postgresql
database and filter the hits from elasticsearch on my backend?
I guess storing ACLs in elasticsearch has a performance benefit.
But the drawback is that I have to maintain the ACLs in both, the
PostgreSQL database and elasticsearch.
I have been searching the forum for some answers and I came up with
this one:
https://groups.google.com/d/topic/elasticsearch/By71n8zL56U/discussion
I would be really helpful for some first hand experience. I guess this
is a common use case.
--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.