Security and ACLs

My question is conceptually rather than technically.

Let's assume following:

  • I have a PostgreSQL database that contains the data of my
    web-application.
  • I use the JDBC River to sync selected documents between the PostgreSQL
    database and my elasticsearch instance.
  • I have a Spring based web-application that uses Spring Security ACL as a
    permission system (READ,WRITE,CREATE;DELETE).
  • I have permissions/ACL on only two documents (i.e. Document A, Document
    B).
  • Document B can inherit permissions from Document A (default)

The service methods in my Spring application that require authorizations
are annotated with appropriate checks (they use AOP to check permissions).
That works fine.

When a user searches, it doesn't run the search directly against
elasticsearch but the search term is sent to the web-application which which
creates the search query and sends it to elasticsearch.
The hits are then analyzed by the backend and populated with some
additional data from the PostgreSQL database and sent back to the user.

Now I want to add also the same security checks when I want to search for
either Document A and B.

Here are some points I am concerned.

1.) Is it actually a "good" approach to use the web-application as a proxy
for my search queries (do other people do that?)
2.) Should I add the ACL's to my documents in elasticsearch and filter them
based on that OR should I only store ACLs in my postgresql database and
filter the hits from elasticsearch on my backend?

I guess storing ACLs in elasticsearch has a performance benefit.
But the drawback is that I have to maintain the ACLs in both, the
PostgreSQL database and elasticsearch.

I have been searching the forum for some answers and I came up with this
one: https://groups.google.com/d/topic/elasticsearch/By71n8zL56U/discussion

I would be really helpful for some first hand experience. I guess this is a
common use case.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

As said, security/ACL is a wide subject.

In my systems I have Java App servers in front of the ES search which
take the burden of session management and presentation layer, which
includes access permissions. Because I don't have a concept of document
ownership, I can let the users choose on what index they can search on.
In the app configuration I can assign what ES indexes are available to
which users.

If I understand correctly, your question boils down to the general
aspect of document modeling.

  • there are users and documents, each document is tagged with a code for
    a varying number of users, and documens are stored (in a primary store)
    and indexed (in a secondary store, the search engine)

  • there are different codes, codes that are used for searching
    documents, and maybe there are more codes for retrieving document content

  • when updating users and documents, all codes must be correctly
    assigned immediately (because it's security, it's critical to maintain
    the document access correctly under all kinds of circumstances)

The last observation is the hardest part, it mostly leads to the
decision not to index access codes, but keep them in a single place
where updates can be performed in an isolated, transactional enviornment.

If you want to move the access permissions into the search, you have to
carefully design the documents you want to index.

One of the most used basic principle of indexing documents is known as
denormalization.

Instead of normalizing data, as you would do for a relational database,
you do it the other way round, you select the keys and assign it to each
and every document they belong to (similar to a select query in a
relational database).

Example:

"user": [ "u1", "u2" ]
"permission" : [ "p1", "p2" ]
"documents" : [ "d1"]

access relation table for d1:
u1 -> p1
u2 -> p1
u2 -> p2

=>

{
"_id" : "d1",
"permissions" : [
{ "u1" : [ "p1" ] },
{ "u2" : ["p1", "p2"] }
],
"content" : {
....
}
}

This concept of the permissions object can be simplified if you have a
concept of "document ownership", that is, a fixed set of users and
permissions. E.g. user u1 and u2 and read/write. Then you can
denormalize documents into the index/type model like this:

{
"_index" : "u1",
"_type" : "read",
"_id" : "d1",
"content" : {
...
}
}

{
"_index" : "u2",
"_type" : "read",
"_id" : "d1",
"content" : {
...
}
}

{
"_index" : "u2",
"_type" : "write",
"_id" : "d1",
"content" : {
...
}
}

Note the _id, "d1" is indexed three times, for each user and permission
code. This is the overhead of denormalization. Searches should be
targeted to ES so that at most one document returns.

In summary, you have several options:

  • ignore the permission layer for documents in Elasticsearch completely,
    and check for access in the (transactional) primary storage before
    search is performed

  • transform the permissions to an Elasticsearch index/type model (only
    recommendable with the concept of document ownership and small sets of
    possible permissions), that is, address the documents by users and
    permission codes on the index/type level

  • create a map of the whole permission layer to tags that is suitable
    for Elasticsearch filtering, assign these tags to the documents at index
    time for later filtering relevant documents. The tags in the permission
    map should be designed as simple and short as possible, and every
    document is indexed only once.

  • as there can be no inheritance in Elasticsearch documents, you can put
    your tags in JSON objects/arrays, e.g. permission "p1" and permission
    "p2" -> "permission" : [ "p1", "p2" ] suitable for a boolean "and" filter

If you must reassign permissions often, you must ensure to reindex the
documents and their permissions often, which puts extra workload on the
system. Run your tests carefully to check out if your requirements fit
into these kind of conditions.

Jörg

Am 28.02.13 09:08, schrieb Ümit Seren:

My question is conceptually rather than technically.

Let's assume following:

  • I have a PostgreSQL database that contains the data of my
    web-application.
  • I use the JDBC River to sync selected documents between the
    PostgreSQL database and my elasticsearch instance.
  • I have a Spring based web-application that uses Spring Security ACL
    as a permission system (READ,WRITE,CREATE;DELETE).
  • I have permissions/ACL on only two documents (i.e. Document A,
    Document B).
  • Document B can inherit permissions from Document A (default)

The service methods in my Spring application that require
authorizations are annotated with appropriate checks (they use AOP to
check permissions). That works fine.

When a user searches, it doesn't run the search directly against
elasticsearch but the search term is sent to the web-application which
which creates the search query and sends it to elasticsearch.
The hits are then analyzed by the backend and populated with some
additional data from the PostgreSQL database and sent back to the user.

Now I want to add also the same security checks when I want to search
for either Document A and B.

Here are some points I am concerned.

1.) Is it actually a "good" approach to use the web-application as a
proxy for my search queries (do other people do that?)
2.) Should I add the ACL's to my documents in elasticsearch and filter
them based on that OR should I only store ACLs in my postgresql
database and filter the hits from elasticsearch on my backend?

I guess storing ACLs in elasticsearch has a performance benefit.
But the drawback is that I have to maintain the ACLs in both, the
PostgreSQL database and elasticsearch.

I have been searching the forum for some answers and I came up with
this one:
https://groups.google.com/d/topic/elasticsearch/By71n8zL56U/discussion

I would be really helpful for some first hand experience. I guess this
is a common use case.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

@Jörg: Thanks you for these interesting concepts (I can see the advantages
and dis-advantages).
Maybe I can specify a little bit more my use case:

  • I am using some kind of RBAC permission system with 3 ROLES (Anonymous,
    User, Admin).
  • Permissions can be granted either to a ROLE or a User or both.
  • By default newly added documents have full permissions (Read,Write,
    Delete) for the user who created/added the data.
  • The user can at any point give other users permission or make the record
    public by giving READ permission to the Role anonymous.

So it seems that from a maintainability point of view storing the ACLs only
in one single point (RDBMS) is probably the easiest solution and probably
also the most flexible one (I can change the permission systems later on
without changing the document model in the search engine).

If I go with this option how will this affect faceted searches ? I probably
have to update the counts and information based on the permissions? I
guess however that's the same as filtering a normal search.

Thanks in advance

On Thu, Feb 28, 2013 at 10:28 AM, Jörg Prante joergprante@gmail.com wrote:

As said, security/ACL is a wide subject.

In my systems I have Java App servers in front of the ES search which take
the burden of session management and presentation layer, which includes
access permissions. Because I don't have a concept of document ownership, I
can let the users choose on what index they can search on. In the app
configuration I can assign what ES indexes are available to which users.

If I understand correctly, your question boils down to the general aspect
of document modeling.

  • there are users and documents, each document is tagged with a code for a
    varying number of users, and documens are stored (in a primary store) and
    indexed (in a secondary store, the search engine)

  • there are different codes, codes that are used for searching documents,
    and maybe there are more codes for retrieving document content

  • when updating users and documents, all codes must be correctly assigned
    immediately (because it's security, it's critical to maintain the document
    access correctly under all kinds of circumstances)

The last observation is the hardest part, it mostly leads to the decision
not to index access codes, but keep them in a single place where updates
can be performed in an isolated, transactional enviornment.

If you want to move the access permissions into the search, you have to
carefully design the documents you want to index.

One of the most used basic principle of indexing documents is known as
denormalization.

Instead of normalizing data, as you would do for a relational database,
you do it the other way round, you select the keys and assign it to each
and every document they belong to (similar to a select query in a
relational database).

Example:

"user": [ "u1", "u2" ]
"permission" : [ "p1", "p2" ]
"documents" : [ "d1"]

access relation table for d1:
u1 -> p1
u2 -> p1
u2 -> p2

=>

{
"_id" : "d1",
"permissions" : [
{ "u1" : [ "p1" ] },
{ "u2" : ["p1", "p2"] }
],
"content" : {
....
}
}

This concept of the permissions object can be simplified if you have a
concept of "document ownership", that is, a fixed set of users and
permissions. E.g. user u1 and u2 and read/write. Then you can denormalize
documents into the index/type model like this:

{
"_index" : "u1",
"_type" : "read",
"_id" : "d1",
"content" : {
...
}
}

{
"_index" : "u2",
"_type" : "read",
"_id" : "d1",
"content" : {
...
}
}

{
"_index" : "u2",
"_type" : "write",
"_id" : "d1",
"content" : {
...
}
}

Note the _id, "d1" is indexed three times, for each user and permission
code. This is the overhead of denormalization. Searches should be targeted
to ES so that at most one document returns.

In summary, you have several options:

  • ignore the permission layer for documents in Elasticsearch completely,
    and check for access in the (transactional) primary storage before search
    is performed

  • transform the permissions to an Elasticsearch index/type model (only
    recommendable with the concept of document ownership and small sets of
    possible permissions), that is, address the documents by users and
    permission codes on the index/type level

  • create a map of the whole permission layer to tags that is suitable for
    Elasticsearch filtering, assign these tags to the documents at index time
    for later filtering relevant documents. The tags in the permission map
    should be designed as simple and short as possible, and every document is
    indexed only once.

  • as there can be no inheritance in Elasticsearch documents, you can put
    your tags in JSON objects/arrays, e.g. permission "p1" and permission "p2"
    -> "permission" : [ "p1", "p2" ] suitable for a boolean "and" filter

If you must reassign permissions often, you must ensure to reindex the
documents and their permissions often, which puts extra workload on the
system. Run your tests carefully to check out if your requirements fit into
these kind of conditions.

Jörg

Am 28.02.13 09:08, schrieb Ümit Seren:

My question is conceptually rather than technically.

Let's assume following:

  • I have a PostgreSQL database that contains the data of my
    web-application.
  • I use the JDBC River to sync selected documents between the PostgreSQL
    database and my elasticsearch instance.
  • I have a Spring based web-application that uses Spring Security ACL as
    a permission system (READ,WRITE,CREATE;DELETE).
  • I have permissions/ACL on only two documents (i.e. Document A,
    Document B).
  • Document B can inherit permissions from Document A (default)

The service methods in my Spring application that require authorizations
are annotated with appropriate checks (they use AOP to check permissions).
That works fine.

When a user searches, it doesn't run the search directly against
elasticsearch but the search term is sent to the web-application which
which creates the search query and sends it to elasticsearch.
The hits are then analyzed by the backend and populated with some
additional data from the PostgreSQL database and sent back to the user.

Now I want to add also the same security checks when I want to search for
either Document A and B.

Here are some points I am concerned.

1.) Is it actually a "good" approach to use the web-application as a
proxy for my search queries (do other people do that?)
2.) Should I add the ACL's to my documents in elasticsearch and filter
them based on that OR should I only store ACLs in my postgresql database
and filter the hits from elasticsearch on my backend?

I guess storing ACLs in elasticsearch has a performance benefit.
But the drawback is that I have to maintain the ACLs in both, the
PostgreSQL database and elasticsearch.

I have been searching the forum for some answers and I came up with this
one: https://groups.google.com/d/topic/elasticsearch/
By71n8zL56U/discussionhttps://groups.google.com/d/topic/elasticsearch/By71n8zL56U/discussion

I would be really helpful for some first hand experience. I guess this is
a common use case.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.

For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**
topic/elasticsearch/**AHqmwEiE1mM/unsubscribe?hl=en-**UShttps://groups.google.com/d/topic/elasticsearch/AHqmwEiE1mM/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@**googlegroups.comelasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_outhttps://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Yes, technically, it is nothing but filtering a normal search. A faceted
search would have to be restricted to a subset of documents of course.

If you want to include many dimensions like user, role, access mode etc.
you have to set up the corresponding boolean filter. The filter must be
applied to all of the queries to ensure the search scope matches the
security (= search visibility) requirements.

Jörg

Am 28.02.13 11:12, schrieb Ümit Seren:

If I go with this option how will this affect faceted searches ? I
probably have to update the counts and information based on the
permissions? I guess however that's the same as filtering a normal search.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.