Hi,
newbie here. I have a question about a usecase. Evaluating using solr vs
elasticsearch.
The use case is this: we have many users, who each have their own list of
objects they can search. Think Gmail. I should be able to search my own
email, and you should not be able to hack the system to search my email.
I am hoping we can use elasticsearch's JSON api directly (so we don't have
to create our own json wrapper in between), but I'm not sure how
authentication works. Can I authenticate a user so that they can only search
their own email? Or once I authenticate, they can search the entire email
library of everyone?
One option is to create an index per user. The mapping will be
consistent between indices. If there is no overlap between users (a
user will never have access to another user's content), then multiple
indices should work great.
Hi,
newbie here. I have a question about a usecase. Evaluating using solr vs
elasticsearch.
The use case is this: we have many users, who each have their own list of
objects they can search. Think Gmail. I should be able to search my own
email, and you should not be able to hack the system to search my email.
I am hoping we can use elasticsearch's JSON api directly (so we don't have
to create our own json wrapper in between), but I'm not sure how
authentication works. Can I authenticate a user so that they can only search
their own email? Or once I authenticate, they can search the entire email
library of everyone?
Hi,
newbie here. I have a question about a usecase. Evaluating using solr vs
elasticsearch.
The use case is this: we have many users, who each have their own list of
objects they can search. Think Gmail. I should be able to search my own
email, and you should not be able to hack the system to search my email.
I am hoping we can use elasticsearch's JSON api directly (so we don't have
to create our own json wrapper in between), but I'm not sure how
authentication works. Can I authenticate a user so that they can only search
their own email? Or once I authenticate, they can search the entire email
library of everyone?
How many users do you plan to support?
What kind of search do you plan to provide? Do you for example plan to
provide some search suggestions while user types in the search terms or
spell checking?
Hi,
newbie here. I have a question about a usecase. Evaluating using solr vs
elasticsearch.
The use case is this: we have many users, who each have their own list of
objects they can search. Think Gmail. I should be able to search my own
email, and you should not be able to hack the system to search my email.
I am hoping we can use elasticsearch's JSON api directly (so we don't have
to create our own json wrapper in between), but I'm not sure how
authentication works. Can I authenticate a user so that they can only search
their own email? Or once I authenticate, they can search the entire email
library of everyone?
Hi Ivan,
thanks. An index per user seems to make sense. There is (in our case) indeed
no overlap between users, although that may change in the future (but we
could figure it out then).
One option is to create an index per user. The mapping will be
consistent between indices. If there is no overlap between users (a
user will never have access to another user's content), then multiple
indices should work great.
Hi,
newbie here. I have a question about a usecase. Evaluating using solr vs
elasticsearch.
The use case is this: we have many users, who each have their own list of
objects they can search. Think Gmail. I should be able to search my own
email, and you should not be able to hack the system to search my email.
I am hoping we can use elasticsearch's JSON api directly (so we don't
have
to create our own json wrapper in between), but I'm not sure how
authentication works. Can I authenticate a user so that they can only
search
their own email? Or once I authenticate, they can search the entire email
library of everyone?
The kind of search is faceted search: ie. many objects (say, emails)
are tagged with tags in different facets. Pretty classic faceted browsing
approach, which is why we are looking at Solr and now elasticsearch.
Search suggestions are nice but not a must have right now.
How many users do you plan to support?
What kind of search do you plan to provide? Do you for example plan to
provide some search suggestions while user types in the search terms or
spell checking?
Hi,
newbie here. I have a question about a usecase. Evaluating using solr vs
elasticsearch.
The use case is this: we have many users, who each have their own list of
objects they can search. Think Gmail. I should be able to search my own
email, and you should not be able to hack the system to search my email.
I am hoping we can use elasticsearch's JSON api directly (so we don't have
to create our own json wrapper in between), but I'm not sure how
authentication works. Can I authenticate a user so that they can only search
their own email? Or once I authenticate, they can search the entire email
library of everyone?
Yes, but then couldn't user X just search for user Y's content if they know
user Y's userid?
My question is mostly around authentication: how do we avoid an app to
access ALL content, instead of only the content owned by the user that's
using it? This may be so simple I'm not seeing it..
Hi,
newbie here. I have a question about a usecase. Evaluating using solr vs
elasticsearch.
The use case is this: we have many users, who each have their own list of
objects they can search. Think Gmail. I should be able to search my own
email, and you should not be able to hack the system to search my email.
I am hoping we can use elasticsearch's JSON api directly (so we don't have
to create our own json wrapper in between), but I'm not sure how
authentication works. Can I authenticate a user so that they can only search
their own email? Or once I authenticate, they can search the entire email
library of everyone?
I think you should check with Shay if Elastic Search can scale to millions
of indices. (I do not think creating a new index per user is a way to go in
case of millions users.)
Which leads me to my other question. You can perfectly store more users into
one shared index and ensure that the logic strictly uses user_id filters
when searching but you probably need to take care of suggestions so that
they do not introduce terms from different users to you.
The kind of search is faceted search: ie. many objects (say, emails)
are tagged with tags in different facets. Pretty classic faceted browsing
approach, which is why we are looking at Solr and now elasticsearch.
Search suggestions are nice but not a must have right now.
How many users do you plan to support?
What kind of search do you plan to provide? Do you for example plan to
provide some search suggestions while user types in the search terms or
spell checking?
Hi,
newbie here. I have a question about a usecase. Evaluating using solr vs
elasticsearch.
The use case is this: we have many users, who each have their own list of
objects they can search. Think Gmail. I should be able to search my own
email, and you should not be able to hack the system to search my email.
I am hoping we can use elasticsearch's JSON api directly (so we don't
have to create our own json wrapper in between), but I'm not sure how
authentication works. Can I authenticate a user so that they can only search
their own email? Or once I authenticate, they can search the entire email
library of everyone?
I think you should check with Shay if Elastic Search can scale to millions
of indices. (I do not think creating a new index per user is a way to go in
case of millions users.)
Which leads me to my other question. You can perfectly store more users
into one shared index and ensure that the logic strictly uses user_id
filters when searching but you probably need to take care of suggestions so
that they do not introduce terms from different users to you.
The kind of search is faceted search: ie. many objects (say, emails)
are tagged with tags in different facets. Pretty classic faceted browsing
approach, which is why we are looking at Solr and now elasticsearch.
Search suggestions are nice but not a must have right now.
How many users do you plan to support?
What kind of search do you plan to provide? Do you for example plan to
provide some search suggestions while user types in the search terms or
spell checking?
Hi,
newbie here. I have a question about a usecase. Evaluating using solr vs
elasticsearch.
The use case is this: we have many users, who each have their own list
of objects they can search. Think Gmail. I should be able to search my own
email, and you should not be able to hack the system to search my email.
I am hoping we can use elasticsearch's JSON api directly (so we don't
have to create our own json wrapper in between), but I'm not sure how
authentication works. Can I authenticate a user so that they can only search
their own email? Or once I authenticate, they can search the entire email
library of everyone?
On Wed, 2011-02-09 at 18:54 -0500, Peter Van Dijck wrote:
Yes, but then couldn't user X just search for user Y's content if they
know user Y's userid?
This depends entirely on your interface. Would you open up your
database to receive absolutely any request from outside?
Obviously, no.
Similarly, you want to sanitise any request coming in to Elasticsearch.
You wouldn't just put your ES server on the web. Too easy for somebody
to send a shutdown or delete-index request.
But any query should pass through your front end and apply your
application specific rules.
For instance, you wouldn't want to accept text queries like:
"foo bar user_id:1234"
But you might want to accept:
"foo bar from:joe"
So your query string should be parsed and sanitised before passing that
query on.
But any query should pass through your front end and apply your
application specific rules.
OK, thanks that helps a lot. This may be blindingly obvious to the more
experienced people, but I was thinking of actually exposing the json rest
api to the world. That's why I was asking about authentication per user.
So we need a pass-through, where we handle auth and such.
Thanks again, if I could upvote this answer I would
Peter
For instance, you wouldn't want to accept text queries like:
"foo bar user_id:1234"
But you might want to accept:
"foo bar from:joe"
So your query string should be parsed and sanitised before passing that
query on.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.