Security and ACLs

Umit_Seren · February 28, 2013, 8:08am

My question is conceptually rather than technically.

Let's assume following:

I have a PostgreSQL database that contains the data of my
web-application.
I use the JDBC River to sync selected documents between the PostgreSQL
database and my elasticsearch instance.
I have a Spring based web-application that uses Spring Security ACL as a
permission system (READ,WRITE,CREATE;DELETE).
I have permissions/ACL on only two documents (i.e. Document A, Document
B).
Document B can inherit permissions from Document A (default)

The service methods in my Spring application that require authorizations
are annotated with appropriate checks (they use AOP to check permissions).
That works fine.

When a user searches, it doesn't run the search directly against
elasticsearch but the search term is sent to the web-application which which
creates the search query and sends it to elasticsearch.
The hits are then analyzed by the backend and populated with some
additional data from the PostgreSQL database and sent back to the user.

Now I want to add also the same security checks when I want to search for
either Document A and B.

Here are some points I am concerned.

1.) Is it actually a "good" approach to use the web-application as a proxy
for my search queries (do other people do that?)
2.) Should I add the ACL's to my documents in elasticsearch and filter them
based on that OR should I only store ACLs in my postgresql database and
filter the hits from elasticsearch on my backend?

I guess storing ACLs in elasticsearch has a performance benefit.
But the drawback is that I have to maintain the ACLs in both, the
PostgreSQL database and elasticsearch.

I have been searching the forum for some answers and I came up with this
one: https://groups.google.com/d/topic/elasticsearch/By71n8zL56U/discussion

I would be really helpful for some first hand experience. I guess this is a
common use case.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · February 28, 2013, 9:28am

As said, security/ACL is a wide subject.

In my systems I have Java App servers in front of the ES search which
take the burden of session management and presentation layer, which
includes access permissions. Because I don't have a concept of document
ownership, I can let the users choose on what index they can search on.
In the app configuration I can assign what ES indexes are available to
which users.

If I understand correctly, your question boils down to the general
aspect of document modeling.

there are users and documents, each document is tagged with a code for
a varying number of users, and documens are stored (in a primary store)
and indexed (in a secondary store, the search engine)
there are different codes, codes that are used for searching
documents, and maybe there are more codes for retrieving document content
when updating users and documents, all codes must be correctly
assigned immediately (because it's security, it's critical to maintain
the document access correctly under all kinds of circumstances)

The last observation is the hardest part, it mostly leads to the
decision not to index access codes, but keep them in a single place
where updates can be performed in an isolated, transactional enviornment.

If you want to move the access permissions into the search, you have to
carefully design the documents you want to index.

One of the most used basic principle of indexing documents is known as
denormalization.

Instead of normalizing data, as you would do for a relational database,
you do it the other way round, you select the keys and assign it to each
and every document they belong to (similar to a select query in a
relational database).

Example:

"user": [ "u1", "u2" ]
"permission" : [ "p1", "p2" ]
"documents" : [ "d1"]

access relation table for d1:
u1 -> p1
u2 -> p1
u2 -> p2

=>

{
"_id" : "d1",
"permissions" : [
{ "u1" : [ "p1" ] },
{ "u2" : ["p1", "p2"] }
],
"content" : {
....
}
}

This concept of the permissions object can be simplified if you have a
concept of "document ownership", that is, a fixed set of users and
permissions. E.g. user u1 and u2 and read/write. Then you can
denormalize documents into the index/type model like this:

{
"_index" : "u1",
"_type" : "read",
"_id" : "d1",
"content" : {
...
}
}

{
"_index" : "u2",
"_type" : "read",
"_id" : "d1",
"content" : {
...
}
}

{
"_index" : "u2",
"_type" : "write",
"_id" : "d1",
"content" : {
...
}
}

Note the _id, "d1" is indexed three times, for each user and permission
code. This is the overhead of denormalization. Searches should be
targeted to ES so that at most one document returns.

In summary, you have several options:

ignore the permission layer for documents in Elasticsearch completely,
and check for access in the (transactional) primary storage before
search is performed
transform the permissions to an Elasticsearch index/type model (only
recommendable with the concept of document ownership and small sets of
possible permissions), that is, address the documents by users and
permission codes on the index/type level
create a map of the whole permission layer to tags that is suitable
for Elasticsearch filtering, assign these tags to the documents at index
time for later filtering relevant documents. The tags in the permission
map should be designed as simple and short as possible, and every
document is indexed only once.
as there can be no inheritance in Elasticsearch documents, you can put
your tags in JSON objects/arrays, e.g. permission "p1" and permission
"p2" -> "permission" : [ "p1", "p2" ] suitable for a boolean "and" filter

If you must reassign permissions often, you must ensure to reindex the
documents and their permissions often, which puts extra workload on the
system. Run your tests carefully to check out if your requirements fit
into these kind of conditions.

Jörg

Am 28.02.13 09:08, schrieb Ümit Seren:

My question is conceptually rather than technically.

Let's assume following:

I have a PostgreSQL database that contains the data of my
web-application.

I use the JDBC River to sync selected documents between the
PostgreSQL database and my elasticsearch instance.

I have a Spring based web-application that uses Spring Security ACL
as a permission system (READ,WRITE,CREATE;DELETE).

I have permissions/ACL on only two documents (i.e. Document A,
Document B).

Document B can inherit permissions from Document A (default)

The service methods in my Spring application that require
authorizations are annotated with appropriate checks (they use AOP to
check permissions). That works fine.

When a user searches, it doesn't run the search directly against
elasticsearch but the search term is sent to the web-application which
which creates the search query and sends it to elasticsearch.
The hits are then analyzed by the backend and populated with some
additional data from the PostgreSQL database and sent back to the user.

Now I want to add also the same security checks when I want to search
for either Document A and B.

Here are some points I am concerned.

1.) Is it actually a "good" approach to use the web-application as a
proxy for my search queries (do other people do that?)
2.) Should I add the ACL's to my documents in elasticsearch and filter
them based on that OR should I only store ACLs in my postgresql
database and filter the hits from elasticsearch on my backend?

I guess storing ACLs in elasticsearch has a performance benefit.
But the drawback is that I have to maintain the ACLs in both, the
PostgreSQL database and elasticsearch.

I have been searching the forum for some answers and I came up with
this one:
https://groups.google.com/d/topic/elasticsearch/By71n8zL56U/discussion

I would be really helpful for some first hand experience. I guess this
is a common use case.

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Umit_Seren · February 28, 2013, 10:12am

@Jörg: Thanks you for these interesting concepts (I can see the advantages
and dis-advantages).
Maybe I can specify a little bit more my use case:

I am using some kind of RBAC permission system with 3 ROLES (Anonymous,
User, Admin).
Permissions can be granted either to a ROLE or a User or both.
By default newly added documents have full permissions (Read,Write,
Delete) for the user who created/added the data.
The user can at any point give other users permission or make the record
public by giving READ permission to the Role anonymous.

So it seems that from a maintainability point of view storing the ACLs only
in one single point (RDBMS) is probably the easiest solution and probably
also the most flexible one (I can change the permission systems later on
without changing the document model in the search engine).

If I go with this option how will this affect faceted searches ? I probably
have to update the counts and information based on the permissions? I
guess however that's the same as filtering a normal search.

Thanks in advance

On Thu, Feb 28, 2013 at 10:28 AM, Jörg Prante joergprante@gmail.com wrote:

As said, security/ACL is a wide subject.

In my systems I have Java App servers in front of the ES search which take
the burden of session management and presentation layer, which includes
access permissions. Because I don't have a concept of document ownership, I
can let the users choose on what index they can search on. In the app
configuration I can assign what ES indexes are available to which users.

If I understand correctly, your question boils down to the general aspect
of document modeling.

there are users and documents, each document is tagged with a code for a
varying number of users, and documens are stored (in a primary store) and
indexed (in a secondary store, the search engine)

there are different codes, codes that are used for searching documents,
and maybe there are more codes for retrieving document content

when updating users and documents, all codes must be correctly assigned
immediately (because it's security, it's critical to maintain the document
access correctly under all kinds of circumstances)

The last observation is the hardest part, it mostly leads to the decision
not to index access codes, but keep them in a single place where updates
can be performed in an isolated, transactional enviornment.

If you want to move the access permissions into the search, you have to
carefully design the documents you want to index.

One of the most used basic principle of indexing documents is known as
denormalization.

Instead of normalizing data, as you would do for a relational database,
you do it the other way round, you select the keys and assign it to each
and every document they belong to (similar to a select query in a
relational database).

Example:

"user": [ "u1", "u2" ]
"permission" : [ "p1", "p2" ]
"documents" : [ "d1"]

access relation table for d1:
u1 -> p1
u2 -> p1
u2 -> p2

=>

{
"_id" : "d1",
"permissions" : [
{ "u1" : [ "p1" ] },
{ "u2" : ["p1", "p2"] }
],
"content" : {
....
}
}

This concept of the permissions object can be simplified if you have a
concept of "document ownership", that is, a fixed set of users and
permissions. E.g. user u1 and u2 and read/write. Then you can denormalize
documents into the index/type model like this:

{
"_index" : "u1",
"_type" : "read",
"_id" : "d1",
"content" : {
...
}
}

{
"_index" : "u2",
"_type" : "read",
"_id" : "d1",
"content" : {
...
}
}

{
"_index" : "u2",
"_type" : "write",
"_id" : "d1",
"content" : {
...
}
}

Note the _id, "d1" is indexed three times, for each user and permission
code. This is the overhead of denormalization. Searches should be targeted
to ES so that at most one document returns.

In summary, you have several options:

ignore the permission layer for documents in Elasticsearch completely,
and check for access in the (transactional) primary storage before search
is performed

transform the permissions to an Elasticsearch index/type model (only
recommendable with the concept of document ownership and small sets of
possible permissions), that is, address the documents by users and
permission codes on the index/type level

create a map of the whole permission layer to tags that is suitable for
Elasticsearch filtering, assign these tags to the documents at index time
for later filtering relevant documents. The tags in the permission map
should be designed as simple and short as possible, and every document is
indexed only once.

as there can be no inheritance in Elasticsearch documents, you can put
your tags in JSON objects/arrays, e.g. permission "p1" and permission "p2"
-> "permission" : [ "p1", "p2" ] suitable for a boolean "and" filter

If you must reassign permissions often, you must ensure to reindex the
documents and their permissions often, which puts extra workload on the
system. Run your tests carefully to check out if your requirements fit into
these kind of conditions.

Jörg

Am 28.02.13 09:08, schrieb Ümit Seren:

My question is conceptually rather than technically.

Let's assume following:

I have a PostgreSQL database that contains the data of my
web-application.

I use the JDBC River to sync selected documents between the PostgreSQL
database and my elasticsearch instance.

I have a Spring based web-application that uses Spring Security ACL as
a permission system (READ,WRITE,CREATE;DELETE).

I have permissions/ACL on only two documents (i.e. Document A,
Document B).

Document B can inherit permissions from Document A (default)

The service methods in my Spring application that require authorizations
are annotated with appropriate checks (they use AOP to check permissions).
That works fine.

When a user searches, it doesn't run the search directly against
elasticsearch but the search term is sent to the web-application which
which creates the search query and sends it to elasticsearch.
The hits are then analyzed by the backend and populated with some
additional data from the PostgreSQL database and sent back to the user.

Now I want to add also the same security checks when I want to search for
either Document A and B.

Here are some points I am concerned.

1.) Is it actually a "good" approach to use the web-application as a
proxy for my search queries (do other people do that?)
2.) Should I add the ACL's to my documents in elasticsearch and filter
them based on that OR should I only store ACLs in my postgresql database
and filter the hits from elasticsearch on my backend?

I guess storing ACLs in elasticsearch has a performance benefit.
But the drawback is that I have to maintain the ACLs in both, the
PostgreSQL database and elasticsearch.

I have been searching the forum for some answers and I came up with this
one: https://groups.google.com/d/**topic/elasticsearch/**
By71n8zL56U/discussionhttps://groups.google.com/d/topic/elasticsearch/By71n8zL56U/discussion

I would be really helpful for some first hand experience. I guess this is
a common use case.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@**googlegroups.com elasticsearch%2Bunsubscribe@googlegroups.com
.

For more options, visit https://groups.google.com/**groups/opt_out https://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit https://groups.google.com/d/**
topic/elasticsearch/**AHqmwEiE1mM/unsubscribe?hl=en-**UShttps://groups.google.com/d/topic/elasticsearch/AHqmwEiE1mM/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@**googlegroups.com elasticsearch%2Bunsubscribe@googlegroups.com
.
For more options, visit https://groups.google.com/**groups/opt_out https://groups.google.com/groups/opt_out
.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

jprante · February 28, 2013, 3:46pm

Yes, technically, it is nothing but filtering a normal search. A faceted
search would have to be restricted to a subset of documents of course.

If you want to include many dimensions like user, role, access mode etc.
you have to set up the corresponding boolean filter. The filter must be
applied to all of the queries to ensure the search scope matches the
security (= search visibility) requirements.

Jörg

Am 28.02.13 11:12, schrieb Ümit Seren:

If I go with this option how will this affect faceted searches ? I
probably have to update the counts and information based on the
permissions? I guess however that's the same as filtering a normal search.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Index/search with ACLs and parent/child relationship Elasticsearch	2	882	July 6, 2017
Attribute Based Security Elasticsearch elastic-stack-security	5	671	September 10, 2021
Thoughts on an approach of ACL filtering on content Elasticsearch	1	1473	July 5, 2017
Index documents out of postgresql database but without a river plugin Elasticsearch	10	419	July 6, 2017
Filtering on ACL's - Do not get the expected result Elasticsearch	3	574	July 6, 2017

Security and ACLs

Related topics