Advice for implementing a secure graph index with ElasticSearch


(Jeff Kunkle) #1

I've been trying to figure out how I can index a graph data structure using
ElasticSearch and could really use some advice from someone more
knowledgeable than me. First, let me explain the challenge. The graph model
has individual access controls at the vertex (node), edge (relationship),
and property level. I'd like my users to be able to search the graph for
vertices or edges containing matching properties, with two caveats:

  1. They should not get vertex or edge results they don't have permission
    to see.
  2. Properties a user does not have access to see should not be evaluated
    in the query.

My first thought was to index properties as either nested or child
documents of a vertex/edge and use a custom filter to remove properties a
user didn't have access to. The first problem I run into is when I try a
boolean query across properties. For example, assume I want to query a
person vertex by first name and date of birth. Since these properties are
indexed as separate documents there is never a match.

What I essentially need is the ability to query across nested or child
documents and return the parent only when there are matches across the
child documents. For example, assume a parent vertex with one property
document called "full_name" set to Barak Obama and another property
document named "political_party" set to Democrat. Is there any way for me
to query for the parent document of these two properties by asking for one
property with full_name="Barak Obama" and another property with
political_party="Democrat"?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/a8beee5b-82d0-45fa-8666-31e956c03439%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Michael Sick) #2

"Titan is a distributed graph
databasehttp://en.wikipedia.org/wiki/Graph_database optimized
for storing and querying
graphshttp://en.wikipedia.org/wiki/Graph_(mathematics) represented
over a cluster of machines. The cluster can elastically scale to support a
growing dataset and user base. Titan has a pluggable storage architecture
which allows it to build on proven database technology such as Apache
Cassandra http://cassandra.apache.org/, Apache
HBasehttp://hbase.apache.org/,
or Oracle BerkeleyDBhttp://www.oracle.com/technetwork/database/berkeleydb/.
Furthermore, the pluggable indexing architecture supports
ElasticSearchhttp://elasticsearch.com/
and Lucene http://lucene.apache.org/."

I did some basic research for ES + graph and found the Titan project
interesting. Titan separates storage from indexing and only currently
supports ES for the latter. I'm sure that you could implement a storage
engine based on ES too (which makes more sense now that ES 1.x supports
backup/restore). Didn't look into security at all but this might be a good
starting point. Hope it's helpful. --Mike

On Wed, Mar 5, 2014 at 12:10 PM, Jeff Kunkle kunklejr@gmail.com wrote:

I've been trying to figure out how I can index a graph data structure
using ElasticSearch and could really use some advice from someone more
knowledgeable than me. First, let me explain the challenge. The graph model
has individual access controls at the vertex (node), edge (relationship),
and property level. I'd like my users to be able to search the graph for
vertices or edges containing matching properties, with two caveats:

  1. They should not get vertex or edge results they don't have
    permission to see.
  2. Properties a user does not have access to see should not be
    evaluated in the query.

My first thought was to index properties as either nested or child
documents of a vertex/edge and use a custom filter to remove properties a
user didn't have access to. The first problem I run into is when I try a
boolean query across properties. For example, assume I want to query a
person vertex by first name and date of birth. Since these properties are
indexed as separate documents there is never a match.

What I essentially need is the ability to query across nested or child
documents and return the parent only when there are matches across the
child documents. For example, assume a parent vertex with one property
document called "full_name" set to Barak Obama and another property
document named "political_party" set to Democrat. Is there any way for me
to query for the parent document of these two properties by asking for one
property with full_name="Barak Obama" and another property with
political_party="Democrat"?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a8beee5b-82d0-45fa-8666-31e956c03439%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/a8beee5b-82d0-45fa-8666-31e956c03439%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP8axnAv7D3Ux4jXuPiYS9ZSBGSXSxiR0Qg3C3FzcVHoRZXgiw%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Jeff Kunkle) #3

Hi Mike,

Thanks for the reply. We actually started with Titan and its a very good
project, but we couldn't easily add the needed security constraints on top
of it. Hence why I'm exploring this topic. It would be rather
straightforward to implement the index on ElasticSearch if all the data was
open to everyone. I'd be able to consolidate all of a vertex's or edge's
properties in a single document. Unfortunately, that's not the case. The
project I'm working on is at http://lumify.io if that's helpful in any way.

Thanks Again,
Jeff

On Wednesday, March 5, 2014 12:41:59 PM UTC-5, Michael Sick wrote:

https://github.com/thinkaurelius/titan/wiki

"Titan is a distributed graph databasehttp://en.wikipedia.org/wiki/Graph_database optimized
for storing and querying graphshttp://en.wikipedia.org/wiki/Graph_(mathematics) represented
over a cluster of machines. The cluster can elastically scale to support a
growing dataset and user base. Titan has a pluggable storage architecture
which allows it to build on proven database technology such as Apache
Cassandra http://cassandra.apache.org/, Apache HBasehttp://hbase.apache.org/,
or Oracle BerkeleyDBhttp://www.oracle.com/technetwork/database/berkeleydb/.
Furthermore, the pluggable indexing architecture supports ElasticSearchhttp://elasticsearch.com/
and Lucene http://lucene.apache.org/."

I did some basic research for ES + graph and found the Titan project
interesting. Titan separates storage from indexing and only currently
supports ES for the latter. I'm sure that you could implement a storage
engine based on ES too (which makes more sense now that ES 1.x supports
backup/restore). Didn't look into security at all but this might be a good
starting point. Hope it's helpful. --Mike

On Wed, Mar 5, 2014 at 12:10 PM, Jeff Kunkle <kunk...@gmail.com<javascript:>

wrote:

I've been trying to figure out how I can index a graph data structure
using ElasticSearch and could really use some advice from someone more
knowledgeable than me. First, let me explain the challenge. The graph model
has individual access controls at the vertex (node), edge (relationship),
and property level. I'd like my users to be able to search the graph for
vertices or edges containing matching properties, with two caveats:

  1. They should not get vertex or edge results they don't have
    permission to see.
  2. Properties a user does not have access to see should not be
    evaluated in the query.

My first thought was to index properties as either nested or child
documents of a vertex/edge and use a custom filter to remove properties a
user didn't have access to. The first problem I run into is when I try a
boolean query across properties. For example, assume I want to query a
person vertex by first name and date of birth. Since these properties are
indexed as separate documents there is never a match.

What I essentially need is the ability to query across nested or child
documents and return the parent only when there are matches across the
child documents. For example, assume a parent vertex with one property
document called "full_name" set to Barak Obama and another property
document named "political_party" set to Democrat. Is there any way for me
to query for the parent document of these two properties by asking for one
property with full_name="Barak Obama" and another property with
political_party="Democrat"?

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearc...@googlegroups.com <javascript:>.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/a8beee5b-82d0-45fa-8666-31e956c03439%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/a8beee5b-82d0-45fa-8666-31e956c03439%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/1e8acfc2-81db-461f-817e-de1ca0b37c3e%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.


(Michael Sick) #4

Hi Jeff,

Lumify looks very interesting - I'll have to take a serious look later.
While I didn't get very far down the Titan path, my starting point would be
to use Titan for all it's graphing features and add some type of query pre
and post processing to insert ACL information into the ES queries and
indexing statements. Not sure if Titan offers any hooks - but it seems it
could be added.

I'd start with the Aurelius folks and see how something like this could be
added with low impact to the existing interfaces. As far as the ES part, I
have added ACL's to documents before by having the id's as an array. This
worked in the simple case I needed because the owners of the documents
changed infrequently (so there was not much reindexing load) and I didn't
have to think it through more than that.

--Mike

On Wed, Mar 5, 2014 at 1:08 PM, Jeff Kunkle kunklejr@gmail.com wrote:

Hi Mike,

Thanks for the reply. We actually started with Titan and its a very good
project, but we couldn't easily add the needed security constraints on top
of it. Hence why I'm exploring this topic. It would be rather
straightforward to implement the index on ElasticSearch if all the data was
open to everyone. I'd be able to consolidate all of a vertex's or edge's
properties in a single document. Unfortunately, that's not the case. The
project I'm working on is at http://lumify.io if that's helpful in any
way.

Thanks Again,
Jeff

On Wednesday, March 5, 2014 12:41:59 PM UTC-5, Michael Sick wrote:

https://github.com/thinkaurelius/titan/wiki

"Titan is a distributed graph databasehttp://en.wikipedia.org/wiki/Graph_database optimized
for storing and querying graphshttp://en.wikipedia.org/wiki/Graph_(mathematics) represented
over a cluster of machines. The cluster can elastically scale to support a
growing dataset and user base. Titan has a pluggable storage architecture
which allows it to build on proven database technology such as Apache
Cassandra http://cassandra.apache.org/, Apache HBasehttp://hbase.apache.org/,
or Oracle BerkeleyDBhttp://www.oracle.com/technetwork/database/berkeleydb/.
Furthermore, the pluggable indexing architecture supports ElasticSearchhttp://elasticsearch.com/
and Lucene http://lucene.apache.org/."

I did some basic research for ES + graph and found the Titan project
interesting. Titan separates storage from indexing and only currently
supports ES for the latter. I'm sure that you could implement a storage
engine based on ES too (which makes more sense now that ES 1.x supports
backup/restore). Didn't look into security at all but this might be a good
starting point. Hope it's helpful. --Mike

On Wed, Mar 5, 2014 at 12:10 PM, Jeff Kunkle kunk...@gmail.com wrote:

I've been trying to figure out how I can index a graph data structure
using ElasticSearch and could really use some advice from someone more
knowledgeable than me. First, let me explain the challenge. The graph model
has individual access controls at the vertex (node), edge (relationship),
and property level. I'd like my users to be able to search the graph for
vertices or edges containing matching properties, with two caveats:

  1. They should not get vertex or edge results they don't have
    permission to see.
  2. Properties a user does not have access to see should not be
    evaluated in the query.

My first thought was to index properties as either nested or child
documents of a vertex/edge and use a custom filter to remove properties a
user didn't have access to. The first problem I run into is when I try a
boolean query across properties. For example, assume I want to query a
person vertex by first name and date of birth. Since these properties are
indexed as separate documents there is never a match.

What I essentially need is the ability to query across nested or child
documents and return the parent only when there are matches across the
child documents. For example, assume a parent vertex with one property
document called "full_name" set to Barak Obama and another property
document named "political_party" set to Democrat. Is there any way for me
to query for the parent document of these two properties by asking for one
property with full_name="Barak Obama" and another property with
political_party="Democrat"?

--
You received this message because you are subscribed to the Google
Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send
an email to elasticsearc...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/
msgid/elasticsearch/a8beee5b-82d0-45fa-8666-31e956c03439%
40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/a8beee5b-82d0-45fa-8666-31e956c03439%40googlegroups.com?utm_medium=email&utm_source=footer
.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit
https://groups.google.com/d/msgid/elasticsearch/1e8acfc2-81db-461f-817e-de1ca0b37c3e%40googlegroups.comhttps://groups.google.com/d/msgid/elasticsearch/1e8acfc2-81db-461f-817e-de1ca0b37c3e%40googlegroups.com?utm_medium=email&utm_source=footer
.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/CAP8axnAbPq_TozXsh0b7RFe%3DEh4_iuovLpSFuf%3DAz_mWn1%3Dy7g%40mail.gmail.com.
For more options, visit https://groups.google.com/groups/opt_out.


(mohit kaushik) #5

Hi! jeff
You said, you are using lumify. And lumify user secure-graph which
implicitly implements cell level security that you all need. You can easily
have access controls on your user and query returns the vertices in the way
you want. I recently started working with secure-graph and want to
implement the class
"/securegraph-core/src/main/java/org/securegraph/query/GraphQuery.java"
wchich is provided in the package and as it has been much days to your post
so hope you might have figured out it. If you have please notify me.

Thanks
Mohit kaushik

On Wednesday, March 5, 2014 10:40:11 PM UTC+5:30, Jeff Kunkle wrote:

I've been trying to figure out how I can index a graph data structure
using ElasticSearch and could really use some advice from someone more
knowledgeable than me. First, let me explain the challenge. The graph model
has individual access controls at the vertex (node), edge (relationship),
and property level. I'd like my users to be able to search the graph for
vertices or edges containing matching properties, with two caveats:

  1. They should not get vertex or edge results they don't have
    permission to see.
  2. Properties a user does not have access to see should not be
    evaluated in the query.

My first thought was to index properties as either nested or child
documents of a vertex/edge and use a custom filter to remove properties a
user didn't have access to. The first problem I run into is when I try a
boolean query across properties. For example, assume I want to query a
person vertex by first name and date of birth. Since these properties are
indexed as separate documents there is never a match.

What I essentially need is the ability to query across nested or child
documents and return the parent only when there are matches across the
child documents. For example, assume a parent vertex with one property
document called "full_name" set to Barak Obama and another property
document named "political_party" set to Democrat. Is there any way for me
to query for the parent document of these two properties by asking for one
property with full_name="Barak Obama" and another property with
political_party="Democrat"?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0ff88ff7-425f-40ff-91be-826962c904aa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(mohit kaushik) #6

And i also want to ask you, are you from altamira????? i found you on
lumify.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/e8f576c8-5d3f-4402-915d-8d0e9eaf8c10%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Jeff Kunkle) #7

Hi Mohit,

Can you please ask your Lumify questions over on the Lumify google group at
https://groups.google.com/d/forum/lumify? I'd rather not pollute the
ElasticSearch group with unrelated messages.

Thanks,
Jeff

On Friday, May 16, 2014 1:26:28 AM UTC-4, mohit kaushik wrote:

Hi! jeff
You said, you are using lumify. And lumify user secure-graph which
implicitly implements cell level security that you all need. You can easily
have access controls on your user and query returns the vertices in the way
you want. I recently started working with secure-graph and want to
implement the class
"/securegraph-core/src/main/java/org/securegraph/query/GraphQuery.java"
wchich is provided in the package and as it has been much days to your post
so hope you might have figured out it. If you have please notify me.

Thanks
Mohit kaushik

On Wednesday, March 5, 2014 10:40:11 PM UTC+5:30, Jeff Kunkle wrote:

I've been trying to figure out how I can index a graph data structure
using ElasticSearch and could really use some advice from someone more
knowledgeable than me. First, let me explain the challenge. The graph model
has individual access controls at the vertex (node), edge (relationship),
and property level. I'd like my users to be able to search the graph for
vertices or edges containing matching properties, with two caveats:

  1. They should not get vertex or edge results they don't have
    permission to see.
  2. Properties a user does not have access to see should not be
    evaluated in the query.

My first thought was to index properties as either nested or child
documents of a vertex/edge and use a custom filter to remove properties a
user didn't have access to. The first problem I run into is when I try a
boolean query across properties. For example, assume I want to query a
person vertex by first name and date of birth. Since these properties are
indexed as separate documents there is never a match.

What I essentially need is the ability to query across nested or child
documents and return the parent only when there are matches across the
child documents. For example, assume a parent vertex with one property
document called "full_name" set to Barak Obama and another property
document named "political_party" set to Democrat. Is there any way for me
to query for the parent document of these two properties by asking for one
property with full_name="Barak Obama" and another property with
political_party="Democrat"?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/3527e622-cba3-4e5e-8ed3-49df8acf53b8%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(mohit kaushik) #8

ok thanks.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/5a316c9c-f519-4f7b-8574-ee3131e101c6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(justin.hohner) #9

Have you figured out a solution to this problem yet? This may be what
Michael suggested but you might be able to apply the group permissions to
the document. For example create a structure like:
visibility: { groups: ["groupA", "group1"], exclude: ["groupB", "group2"]}

You could then apply the group visibility to the query.

It's not a perfect solutions, but I am curious if it would work and what
sort of impact to expect if it was used.

On Wednesday, March 5, 2014 11:10:11 AM UTC-6, Jeff Kunkle wrote:

I've been trying to figure out how I can index a graph data structure
using ElasticSearch and could really use some advice from someone more
knowledgeable than me. First, let me explain the challenge. The graph model
has individual access controls at the vertex (node), edge (relationship),
and property level. I'd like my users to be able to search the graph for
vertices or edges containing matching properties, with two caveats:

  1. They should not get vertex or edge results they don't have
    permission to see.
  2. Properties a user does not have access to see should not be
    evaluated in the query.

My first thought was to index properties as either nested or child
documents of a vertex/edge and use a custom filter to remove properties a
user didn't have access to. The first problem I run into is when I try a
boolean query across properties. For example, assume I want to query a
person vertex by first name and date of birth. Since these properties are
indexed as separate documents there is never a match.

What I essentially need is the ability to query across nested or child
documents and return the parent only when there are matches across the
child documents. For example, assume a parent vertex with one property
document called "full_name" set to Barak Obama and another property
document named "political_party" set to Democrat. Is there any way for me
to query for the parent document of these two properties by asking for one
property with full_name="Barak Obama" and another property with
political_party="Democrat"?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/c2933a14-9882-4d6c-a6cc-5725160e1551%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(Joe Ferner) #10

I've been working with Jeff on this and I think we have figured out a
solution.
(https://github.com/altamiracorp/securegraph/tree/master/securegraph-elasticsearch-plugin)

By using parent/child documents and a custom filter we were able to query
documents with security. Each child document has a visibility string field
along with "fieldName" field. The filter then filters child documents that
do not have the proper authorizations supplied in the filter. This then
causes the parent document to not return if no children are found.

On Monday, June 30, 2014 8:52:10 AM UTC-4, justin...@gmail.com wrote:

Have you figured out a solution to this problem yet? This may be what
Michael suggested but you might be able to apply the group permissions to
the document. For example create a structure like:
visibility: { groups: ["groupA", "group1"], exclude: ["groupB", "group2"]}

You could then apply the group visibility to the query.

It's not a perfect solutions, but I am curious if it would work and what
sort of impact to expect if it was used.

On Wednesday, March 5, 2014 11:10:11 AM UTC-6, Jeff Kunkle wrote:

I've been trying to figure out how I can index a graph data structure
using ElasticSearch and could really use some advice from someone more
knowledgeable than me. First, let me explain the challenge. The graph model
has individual access controls at the vertex (node), edge (relationship),
and property level. I'd like my users to be able to search the graph for
vertices or edges containing matching properties, with two caveats:

  1. They should not get vertex or edge results they don't have
    permission to see.
  2. Properties a user does not have access to see should not be
    evaluated in the query.

My first thought was to index properties as either nested or child
documents of a vertex/edge and use a custom filter to remove properties a
user didn't have access to. The first problem I run into is when I try a
boolean query across properties. For example, assume I want to query a
person vertex by first name and date of birth. Since these properties are
indexed as separate documents there is never a match.

What I essentially need is the ability to query across nested or child
documents and return the parent only when there are matches across the
child documents. For example, assume a parent vertex with one property
document called "full_name" set to Barak Obama and another property
document named "political_party" set to Democrat. Is there any way for me
to query for the parent document of these two properties by asking for one
property with full_name="Barak Obama" and another property with
political_party="Democrat"?

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elasticsearch/0ed5af3c-a599-42f4-996e-f0db41d6869a%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


(system) #11