Document level security and Connectors

Hi,
Is there a plan to add support for some type of document level based
security scheme to ElasticSearch that works out of the box?
That, and some support for Google's or Apaches connector framework and
we have a major enterprise search contender....

Keeo up the fantastic work!

Hi,

On Sun, Oct 31, 2010 at 11:09 PM, Mark mark.pasternak@gmail.com wrote:

Hi,
Is there a plan to add support for some type of document level based
security scheme to Elasticsearch that works out of the box?

What do you mean by document level security? Filtering docs based on user
roles? This can be built on top of elasticsearch easily. Adding user
management to elasticsearch is certainly possible, but not in the near
future (at least not on my plan), but contribs are welcomed. The reason is
that since this can easily be implemented on top of elasticsearch, the focus
is on more core features. Also, needing to implement it in elasticsearch
requires it to be more product level feature (pluggabble user management,
roles management, and so on).

That, and some support for Google's or Apaches connector framework and
we have a major enterprise search contender....

Can you refer to the relevant sites?

Keeo up the fantastic work!

It makes a lot of sence focusing on the core, and to let the user
community supply patches to extend the functionality and mashup
Elasticsearch with the other open source frameworks out there.

Looking at this from an enteprise search perspective there are two
really promising connector frameworks out there in various stages of
development, that makes it possible to index enterprise sources

http://code.google.com/p/googlesearchapplianceconnectors/
http://incubator.apache.org/connectors/

The task here would be to create a middle-layer that translates the
connector protocols to something that works with Elasticsearch's apis.
For more advanced scenarios you would want to add a processing
pipeline, that lets you normalize and extend the source data. Even
more ambitious would be to have a persistent index queue that lets you
"replay" the indexing queue on demand, for fast reindexing. When you
make changes to the setup somewhere and need to reindex, it is
painfully slow doing it from the connectors when you deal with large
sources. Again, this is something that probably does not belong to the
core...

Document level security, which the connectors support, by extracting
acl:s from the source documents, means you make sure you filter the
documents with appropriate acl:s for each request. There are some
major considerations here, how to encrypt and secure the tokens, how
to resolve the security info for each user, and so on. Building this
right into the search engine can make the experience much smoother for
the developers working with the search engine, as well as more secure,
by building in a rock solid security model.

Solr has an initiative described here
http://wiki.apache.org/solr/SolrSecurity#Document_Level_Security
There are some patches out there to enable this, however, it is very
immature. I guess this is not prio to build in to the core, however,
maybe worth considering for the roadmap

On Oct 31, 10:18 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi,

On Sun, Oct 31, 2010 at 11:09 PM, Mark mark.paster...@gmail.com wrote:

Hi,
Is there a plan to add support for some type of document level based
security scheme to Elasticsearch that works out of the box?

What do you mean by document level security? Filtering docs based on user
roles? This can be built on top of elasticsearch easily. Adding user
management to elasticsearch is certainly possible, but not in the near
future (at least not on my plan), but contribs are welcomed. The reason is
that since this can easily be implemented on top of elasticsearch, the focus
is on more core features. Also, needing to implement it in elasticsearch
requires it to be more product level feature (pluggabble user management,
roles management, and so on).

That, and some support for Google's or Apaches connector framework and
we have a major enterprise search contender....

Can you refer to the relevant sites?

Keeo up the fantastic work!

The first questions in this FAQ talks about how you deal with document
level security with Apache Connectors Framework
https://cwiki.apache.org/confluence/display/CONNECTORS/FAQ

So as long as you use the connectors to index the documents, and make
sure to filter the queries, in the right way, you are good to go. Not
hard to do on the client side of course...

On Oct 31, 11:01 pm, Mark mark.paster...@gmail.com wrote:

It makes a lot of sence focusing on the core, and to let the user
community supply patches to extend the functionality and mashup
Elasticsearch with the other open source frameworks out there.

Looking at this from an enteprise search perspective there are two
really promising connector frameworks out there in various stages of
development, that makes it possible to index enterprise sources

Google Code Archive - Long-term storage for Google Code Project Hosting.

The task here would be to create a middle-layer that translates the
connector protocols to something that works with Elasticsearch's apis.
For more advanced scenarios you would want to add a processing
pipeline, that lets you normalize and extend the source data. Even
more ambitious would be to have a persistent index queue that lets you
"replay" the indexing queue on demand, for fast reindexing. When you
make changes to the setup somewhere and need to reindex, it is
painfully slow doing it from the connectors when you deal with large
sources. Again, this is something that probably does not belong to the
core...

Document level security, which the connectors support, by extracting
acl:s from the source documents, means you make sure you filter the
documents with appropriate acl:s for each request. There are some
major considerations here, how to encrypt and secure the tokens, how
to resolve the security info for each user, and so on. Building this
right into the search engine can make the experience much smoother for
the developers working with the search engine, as well as more secure,
by building in a rock solid security model.

Solr has an initiative described herehttp://wiki.apache.org/solr/SolrSecurity#Document_Level_Security
There are some patches out there to enable this, however, it is very
immature. I guess this is not prio to build in to the core, however,
maybe worth considering for the roadmap

On Oct 31, 10:18 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi,

On Sun, Oct 31, 2010 at 11:09 PM, Mark mark.paster...@gmail.com wrote:

Hi,
Is there a plan to add support for some type of document level based
security scheme to Elasticsearch that works out of the box?

What do you mean by document level security? Filtering docs based on user
roles? This can be built on top of elasticsearch easily. Adding user
management to elasticsearch is certainly possible, but not in the near
future (at least not on my plan), but contribs are welcomed. The reason is
that since this can easily be implemented on top of elasticsearch, the focus
is on more core features. Also, needing to implement it in elasticsearch
requires it to be more product level feature (pluggabble user management,
roles management, and so on).

That, and some support for Google's or Apaches connector framework and
we have a major enterprise search contender....

Can you refer to the relevant sites?

Keeo up the fantastic work!

Mark,

if I understand LCF (or ManifoldCF or whatever name it has now) correctly
then it is not the search server itself what has specific security and
authentication logic implemented in it. This being said I think all it takes
would be implementing an output connector for Elastic Search, note there is
already existing implementation for Solr:
https://cwiki.apache.org/confluence/display/CONNECTORS/How+to+Write+an+Output+Connector

This integration would definitely makes sense and is logical. As you can see
authors of ManifoldCF planned for general design but note that Solr
connector is part of ManifoldCF code not Solr.

https://cwiki.apache.org/confluence/display/CONNECTORS/How+to+Write+an+Output+Connector
Regards,
Lukas

On Sun, Oct 31, 2010 at 11:18 PM, Mark mark.pasternak@gmail.com wrote:

The first questions in this FAQ talks about how you deal with document
level security with Apache Connectors Framework
FAQ - Apache Connectors Framework - Apache Software Foundation

So as long as you use the connectors to index the documents, and make
sure to filter the queries, in the right way, you are good to go. Not
hard to do on the client side of course...

On Oct 31, 11:01 pm, Mark mark.paster...@gmail.com wrote:

It makes a lot of sence focusing on the core, and to let the user
community supply patches to extend the functionality and mashup
Elasticsearch with the other open source frameworks out there.

Looking at this from an enteprise search perspective there are two
really promising connector frameworks out there in various stages of
development, that makes it possible to index enterprise sources

Google Code Archive - Long-term storage for Google Code Project Hosting.

The task here would be to create a middle-layer that translates the
connector protocols to something that works with Elasticsearch's apis.
For more advanced scenarios you would want to add a processing
pipeline, that lets you normalize and extend the source data. Even
more ambitious would be to have a persistent index queue that lets you
"replay" the indexing queue on demand, for fast reindexing. When you
make changes to the setup somewhere and need to reindex, it is
painfully slow doing it from the connectors when you deal with large
sources. Again, this is something that probably does not belong to the
core...

Document level security, which the connectors support, by extracting
acl:s from the source documents, means you make sure you filter the
documents with appropriate acl:s for each request. There are some
major considerations here, how to encrypt and secure the tokens, how
to resolve the security info for each user, and so on. Building this
right into the search engine can make the experience much smoother for
the developers working with the search engine, as well as more secure,
by building in a rock solid security model.

Solr has an initiative described herehttp://
SolrSecurity - Solr - Apache Software Foundation
There are some patches out there to enable this, however, it is very
immature. I guess this is not prio to build in to the core, however,
maybe worth considering for the roadmap

On Oct 31, 10:18 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi,

On Sun, Oct 31, 2010 at 11:09 PM, Mark mark.paster...@gmail.com
wrote:

Hi,
Is there a plan to add support for some type of document level based
security scheme to Elasticsearch that works out of the box?

What do you mean by document level security? Filtering docs based on
user
roles? This can be built on top of elasticsearch easily. Adding user
management to elasticsearch is certainly possible, but not in the near
future (at least not on my plan), but contribs are welcomed. The reason
is
that since this can easily be implemented on top of elasticsearch, the
focus
is on more core features. Also, needing to implement it in
elasticsearch
requires it to be more product level feature (pluggabble user
management,
roles management, and so on).

That, and some support for Google's or Apaches connector framework
and
we have a major enterprise search contender....

Can you refer to the relevant sites?

Keeo up the fantastic work!

That makes a lot of sense!
With an output connector a large number of standard sources,
including http, could be indexed easily into Elastic and the whole
chain from more traditional sources to index would be covered.

On Nov 1, 12:06 am, Lukáš Vlček lukas.vl...@gmail.com wrote:

Mark,

if I understand LCF (or ManifoldCF or whatever name it has now) correctly
then it is not the search server itself what has specific security and
authentication logic implemented in it. This being said I think all it takes
would be implementing an output connector for Elastic Search, note there is
already existing implementation for Solr:https://cwiki.apache.org/confluence/display/CONNECTORS/How+to+Write+a...

This integration would definitely makes sense and is logical. As you can see
authors of ManifoldCF planned for general design but note that Solr
connector is part of ManifoldCF code not Solr.

https://cwiki.apache.org/confluence/display/CONNECTORS/How+to+Write+a...
Regards,
Lukas

On Sun, Oct 31, 2010 at 11:18 PM, Mark mark.paster...@gmail.com wrote:

The first questions in this FAQ talks about how you deal with document
level security with Apache Connectors Framework
FAQ - Apache Connectors Framework - Apache Software Foundation

So as long as you use the connectors to index the documents, and make
sure to filter the queries, in the right way, you are good to go. Not
hard to do on the client side of course...

On Oct 31, 11:01 pm, Mark mark.paster...@gmail.com wrote:

It makes a lot of sence focusing on the core, and to let the user
community supply patches to extend the functionality and mashup
Elasticsearch with the other open source frameworks out there.

Looking at this from an enteprise search perspective there are two
really promising connector frameworks out there in various stages of
development, that makes it possible to index enterprise sources

Google Code Archive - Long-term storage for Google Code Project Hosting....

The task here would be to create a middle-layer that translates the
connector protocols to something that works with Elasticsearch's apis.
For more advanced scenarios you would want to add a processing
pipeline, that lets you normalize and extend the source data. Even
more ambitious would be to have a persistent index queue that lets you
"replay" the indexing queue on demand, for fast reindexing. When you
make changes to the setup somewhere and need to reindex, it is
painfully slow doing it from the connectors when you deal with large
sources. Again, this is something that probably does not belong to the
core...

Document level security, which the connectors support, by extracting
acl:s from the source documents, means you make sure you filter the
documents with appropriate acl:s for each request. There are some
major considerations here, how to encrypt and secure the tokens, how
to resolve the security info for each user, and so on. Building this
right into the search engine can make the experience much smoother for
the developers working with the search engine, as well as more secure,
by building in a rock solid security model.

Solr has an initiative described herehttp://
SolrSecurity - Solr - Apache Software Foundation
There are some patches out there to enable this, however, it is very
immature. I guess this is not prio to build in to the core, however,
maybe worth considering for the roadmap

On Oct 31, 10:18 pm, Shay Banon shay.ba...@elasticsearch.com wrote:

Hi,

On Sun, Oct 31, 2010 at 11:09 PM, Mark mark.paster...@gmail.com
wrote:

Hi,
Is there a plan to add support for some type of document level based
security scheme to Elasticsearch that works out of the box?

What do you mean by document level security? Filtering docs based on
user
roles? This can be built on top of elasticsearch easily. Adding user
management to elasticsearch is certainly possible, but not in the near
future (at least not on my plan), but contribs are welcomed. The reason
is
that since this can easily be implemented on top of elasticsearch, the
focus
is on more core features. Also, needing to implement it in
elasticsearch
requires it to be more product level feature (pluggabble user
management,
roles management, and so on).

That, and some support for Google's or Apaches connector framework
and
we have a major enterprise search contender....

Can you refer to the relevant sites?

Keeo up the fantastic work!

Hi,

Regarding security, as is the case with LCF, you can implement one on top
of elasticsearch by storing roles associated with a document and filter by
that. Having something built into elasticsearch is exponentially harder than
building something on top of it since the feature needs to be a "product"
feature. For example, pluggable user management API (ldap and so on).

Regarding indexing data, LCF is interesting, and can have an output
collector that points to elasticsearch. Another integration point is to
build (elasticsearch) rivers that do that. What I hope is that the community
will start building rivers that allow to index many difference sources of
data.

-shay.banon

On Mon, Nov 1, 2010 at 1:20 AM, Mark mark.pasternak@gmail.com wrote:

That makes a lot of sense!
With an output connector a large number of standard sources,
including http, could be indexed easily into Elastic and the whole
chain from more traditional sources to index would be covered.

On Nov 1, 12:06 am, Lukáš Vlček lukas.vl...@gmail.com wrote:

Mark,

if I understand LCF (or ManifoldCF or whatever name it has now) correctly
then it is not the search server itself what has specific security and
authentication logic implemented in it. This being said I think all it
takes
would be implementing an output connector for Elastic Search, note there
is
already existing implementation for Solr:
https://cwiki.apache.org/confluence/display/CONNECTORS/How+to+Write+a...

This integration would definitely makes sense and is logical. As you can
see
authors of ManifoldCF planned for general design but note that Solr
connector is part of ManifoldCF code not Solr.

<https://cwiki.apache.org/confluence/display/CONNECTORS/How+to+Write+a..
.>
Regards,
Lukas

On Sun, Oct 31, 2010 at 11:18 PM, Mark mark.paster...@gmail.com wrote:

The first questions in this FAQ talks about how you deal with document
level security with Apache Connectors Framework
FAQ - Apache Connectors Framework - Apache Software Foundation

So as long as you use the connectors to index the documents, and make
sure to filter the queries, in the right way, you are good to go. Not
hard to do on the client side of course...

On Oct 31, 11:01 pm, Mark mark.paster...@gmail.com wrote:

It makes a lot of sence focusing on the core, and to let the user
community supply patches to extend the functionality and mashup
Elasticsearch with the other open source frameworks out there.

Looking at this from an enteprise search perspective there are two
really promising connector frameworks out there in various stages of
development, that makes it possible to index enterprise sources

Google Code Archive - Long-term storage for Google Code Project Hosting..
..

The task here would be to create a middle-layer that translates the
connector protocols to something that works with Elasticsearch's
apis.
For more advanced scenarios you would want to add a processing
pipeline, that lets you normalize and extend the source data. Even
more ambitious would be to have a persistent index queue that lets
you
"replay" the indexing queue on demand, for fast reindexing. When you
make changes to the setup somewhere and need to reindex, it is
painfully slow doing it from the connectors when you deal with large
sources. Again, this is something that probably does not belong to
the
core...

Document level security, which the connectors support, by extracting
acl:s from the source documents, means you make sure you filter the
documents with appropriate acl:s for each request. There are some
major considerations here, how to encrypt and secure the tokens, how
to resolve the security info for each user, and so on. Building this
right into the search engine can make the experience much smoother
for
the developers working with the search engine, as well as more
secure,
by building in a rock solid security model.

Solr has an initiative described herehttp://
SolrSecurity - Solr - Apache Software Foundation
There are some patches out there to enable this, however, it is very
immature. I guess this is not prio to build in to the core, however,
maybe worth considering for the roadmap

On Oct 31, 10:18 pm, Shay Banon shay.ba...@elasticsearch.com
wrote:

Hi,

On Sun, Oct 31, 2010 at 11:09 PM, Mark mark.paster...@gmail.com
wrote:

Hi,
Is there a plan to add support for some type of document level
based
security scheme to Elasticsearch that works out of the box?

What do you mean by document level security? Filtering docs based
on
user
roles? This can be built on top of elasticsearch easily. Adding
user
management to elasticsearch is certainly possible, but not in the
near
future (at least not on my plan), but contribs are welcomed. The
reason
is
that since this can easily be implemented on top of elasticsearch,
the
focus
is on more core features. Also, needing to implement it in
elasticsearch
requires it to be more product level feature (pluggabble user
management,
roles management, and so on).

That, and some support for Google's or Apaches connector
framework
and
we have a major enterprise search contender....

Can you refer to the relevant sites?

Keeo up the fantastic work!