Question about use case: gmail-like "I can only see my own emails"?


(Peter Van Dijck) #1

Hi,
newbie here. I have a question about a usecase. Evaluating using solr vs
elasticsearch.

The use case is this: we have many users, who each have their own list of
objects they can search. Think Gmail. I should be able to search my own
email, and you should not be able to hack the system to search my email.

I am hoping we can use elasticsearch's JSON api directly (so we don't have
to create our own json wrapper in between), but I'm not sure how
authentication works. Can I authenticate a user so that they can only search
their own email? Or once I authenticate, they can search the entire email
library of everyone?

Thanks for any pointers!

Cheers,
Peter

--
http://petervandijck.com/
http://twitter.com/petervandijck
Skype id: peterkevandijck
USA tel (SkypeIn nr.): (646) 502-8604


(IvanBrusic) #2

Hi Peter,

One option is to create an index per user. The mapping will be
consistent between indices. If there is no overlap between users (a
user will never have access to another user's content), then multiple
indices should work great.

ivan

On Feb 9, 7:26 am, Peter Van Dijck petervandi...@gmail.com wrote:

Hi,
newbie here. I have a question about a usecase. Evaluating using solr vs
elasticsearch.

The use case is this: we have many users, who each have their own list of
objects they can search. Think Gmail. I should be able to search my own
email, and you should not be able to hack the system to search my email.

I am hoping we can use elasticsearch's JSON api directly (so we don't have
to create our own json wrapper in between), but I'm not sure how
authentication works. Can I authenticate a user so that they can only search
their own email? Or once I authenticate, they can search the entire email
library of everyone?

Thanks for any pointers!

Cheers,
Peter

--http://petervandijck.com/http://twitter.com/petervandijck
Skype id: peterkevandijck
USA tel (SkypeIn nr.): (646) 502-8604


(pcdinh) #3

Not sure there is anything complicated here that you did not mention but I
think your requirement can be implemented quite simple: add a user_id field

On Wed, Feb 9, 2011 at 7:26 PM, Peter Van Dijck petervandijck@gmail.comwrote:

Hi,
newbie here. I have a question about a usecase. Evaluating using solr vs
elasticsearch.

The use case is this: we have many users, who each have their own list of
objects they can search. Think Gmail. I should be able to search my own
email, and you should not be able to hack the system to search my email.

I am hoping we can use elasticsearch's JSON api directly (so we don't have
to create our own json wrapper in between), but I'm not sure how
authentication works. Can I authenticate a user so that they can only search
their own email? Or once I authenticate, they can search the entire email
library of everyone?

Thanks for any pointers!

Cheers,
Peter

--
http://petervandijck.com/
http://twitter.com/petervandijck
Skype id: peterkevandijck
USA tel (SkypeIn nr.): (646) 502-8604

--
Spica Framework: http://code.google.com/p/spica
http://www.twitter.com/pcdinh
http://groups.google.com/group/phpvietnam


(Lukáš Vlček) #4

Hi,

How many users do you plan to support?
What kind of search do you plan to provide? Do you for example plan to
provide some search suggestions while user types in the search terms or
spell checking?

Regards,
Lukas

On Wed, Feb 9, 2011 at 1:26 PM, Peter Van Dijck petervandijck@gmail.comwrote:

Hi,
newbie here. I have a question about a usecase. Evaluating using solr vs
elasticsearch.

The use case is this: we have many users, who each have their own list of
objects they can search. Think Gmail. I should be able to search my own
email, and you should not be able to hack the system to search my email.

I am hoping we can use elasticsearch's JSON api directly (so we don't have
to create our own json wrapper in between), but I'm not sure how
authentication works. Can I authenticate a user so that they can only search
their own email? Or once I authenticate, they can search the entire email
library of everyone?

Thanks for any pointers!

Cheers,
Peter

--
http://petervandijck.com/
http://twitter.com/petervandijck
Skype id: peterkevandijck
USA tel (SkypeIn nr.): (646) 502-8604 <+16465028604>


(Peter Van Dijck) #5

Hi Ivan,
thanks. An index per user seems to make sense. There is (in our case) indeed
no overlap between users, although that may change in the future (but we
could figure it out then).

Peter

On Wed, Feb 9, 2011 at 11:24 AM, Ivan Brusic ivan_brusic@yahoo.com wrote:

Hi Peter,

One option is to create an index per user. The mapping will be
consistent between indices. If there is no overlap between users (a
user will never have access to another user's content), then multiple
indices should work great.

ivan

On Feb 9, 7:26 am, Peter Van Dijck petervandi...@gmail.com wrote:

Hi,
newbie here. I have a question about a usecase. Evaluating using solr vs
elasticsearch.

The use case is this: we have many users, who each have their own list of
objects they can search. Think Gmail. I should be able to search my own
email, and you should not be able to hack the system to search my email.

I am hoping we can use elasticsearch's JSON api directly (so we don't
have
to create our own json wrapper in between), but I'm not sure how
authentication works. Can I authenticate a user so that they can only
search
their own email? Or once I authenticate, they can search the entire email
library of everyone?

Thanks for any pointers!

Cheers,
Peter

--http://petervandijck.com/http://twitter.com/petervandijck
Skype id: peterkevandijck
USA tel (SkypeIn nr.): (646) 502-8604

--
http://petervandijck.com/
http://twitter.com/petervandijck
Skype id: peterkevandijck
USA tel (SkypeIn nr.): (646) 502-8604


(Peter Van Dijck) #6

Hey Lukáš,

  • Hoping for millions of users.
  • The kind of search is faceted search: ie. many objects (say, emails)
    are tagged with tags in different facets. Pretty classic faceted browsing
    approach, which is why we are looking at Solr and now elasticsearch.
  • Search suggestions are nice but not a must have right now.

Peter

On Wed, Feb 9, 2011 at 11:44 AM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Hi,

How many users do you plan to support?
What kind of search do you plan to provide? Do you for example plan to
provide some search suggestions while user types in the search terms or
spell checking?

Regards,
Lukas

On Wed, Feb 9, 2011 at 1:26 PM, Peter Van Dijck petervandijck@gmail.comwrote:

Hi,
newbie here. I have a question about a usecase. Evaluating using solr vs
elasticsearch.

The use case is this: we have many users, who each have their own list of
objects they can search. Think Gmail. I should be able to search my own
email, and you should not be able to hack the system to search my email.

I am hoping we can use elasticsearch's JSON api directly (so we don't have
to create our own json wrapper in between), but I'm not sure how
authentication works. Can I authenticate a user so that they can only search
their own email? Or once I authenticate, they can search the entire email
library of everyone?

Thanks for any pointers!

Cheers,
Peter

--
http://petervandijck.com/
http://twitter.com/petervandijck
Skype id: peterkevandijck
USA tel (SkypeIn nr.): (646) 502-8604 <+16465028604>

--
http://petervandijck.com/
http://twitter.com/petervandijck
Skype id: peterkevandijck
USA tel (SkypeIn nr.): (646) 502-8604


(Peter Van Dijck) #7

Yes, but then couldn't user X just search for user Y's content if they know
user Y's userid?

My question is mostly around authentication: how do we avoid an app to
access ALL content, instead of only the content owned by the user that's
using it? This may be so simple I'm not seeing it..

Peter

On Wed, Feb 9, 2011 at 11:35 AM, Dinh pcdinh@gmail.com wrote:

Not sure there is anything complicated here that you did not mention but I
think your requirement can be implemented quite simple: add a user_id field

On Wed, Feb 9, 2011 at 7:26 PM, Peter Van Dijck petervandijck@gmail.comwrote:

Hi,
newbie here. I have a question about a usecase. Evaluating using solr vs
elasticsearch.

The use case is this: we have many users, who each have their own list of
objects they can search. Think Gmail. I should be able to search my own
email, and you should not be able to hack the system to search my email.

I am hoping we can use elasticsearch's JSON api directly (so we don't have
to create our own json wrapper in between), but I'm not sure how
authentication works. Can I authenticate a user so that they can only search
their own email? Or once I authenticate, they can search the entire email
library of everyone?

Thanks for any pointers!

Cheers,
Peter

--
http://petervandijck.com/
http://twitter.com/petervandijck
Skype id: peterkevandijck
USA tel (SkypeIn nr.): (646) 502-8604

--
Spica Framework: http://code.google.com/p/spica
http://www.twitter.com/pcdinh
http://groups.google.com/group/phpvietnam

--
http://petervandijck.com/
http://twitter.com/petervandijck
Skype id: peterkevandijck
USA tel (SkypeIn nr.): (646) 502-8604


(Lukáš Vlček) #8

Peter,

I think you should check with Shay if Elastic Search can scale to millions
of indices. (I do not think creating a new index per user is a way to go in
case of millions users.)
Which leads me to my other question. You can perfectly store more users into
one shared index and ensure that the logic strictly uses user_id filters
when searching but you probably need to take care of suggestions so that
they do not introduce terms from different users to you.

Just my 2 cents.

Regards,
Lukas

On Thu, Feb 10, 2011 at 12:53 AM, Peter Van Dijck
petervandijck@gmail.comwrote:

Hey Lukáš,

  • Hoping for millions of users.
  • The kind of search is faceted search: ie. many objects (say, emails)
    are tagged with tags in different facets. Pretty classic faceted browsing
    approach, which is why we are looking at Solr and now elasticsearch.
  • Search suggestions are nice but not a must have right now.

Peter

On Wed, Feb 9, 2011 at 11:44 AM, Lukáš Vlček lukas.vlcek@gmail.comwrote:

Hi,

How many users do you plan to support?
What kind of search do you plan to provide? Do you for example plan to
provide some search suggestions while user types in the search terms or
spell checking?

Regards,
Lukas

On Wed, Feb 9, 2011 at 1:26 PM, Peter Van Dijck petervandijck@gmail.comwrote:

Hi,
newbie here. I have a question about a usecase. Evaluating using solr vs
elasticsearch.

The use case is this: we have many users, who each have their own list of
objects they can search. Think Gmail. I should be able to search my own
email, and you should not be able to hack the system to search my email.

I am hoping we can use elasticsearch's JSON api directly (so we don't
have to create our own json wrapper in between), but I'm not sure how
authentication works. Can I authenticate a user so that they can only search
their own email? Or once I authenticate, they can search the entire email
library of everyone?

Thanks for any pointers!

Cheers,
Peter

--
http://petervandijck.com/
http://twitter.com/petervandijck
Skype id: peterkevandijck
USA tel (SkypeIn nr.): (646) 502-8604 <+16465028604>

--
http://petervandijck.com/
http://twitter.com/petervandijck
Skype id: peterkevandijck
USA tel (SkypeIn nr.): (646) 502-8604


(Peter Van Dijck) #9

Thanks Lucas, that helps.

Peter

On Wed, Feb 9, 2011 at 8:22 PM, Lukáš Vlček lukas.vlcek@gmail.com wrote:

Peter,

I think you should check with Shay if Elastic Search can scale to millions
of indices. (I do not think creating a new index per user is a way to go in
case of millions users.)
Which leads me to my other question. You can perfectly store more users
into one shared index and ensure that the logic strictly uses user_id
filters when searching but you probably need to take care of suggestions so
that they do not introduce terms from different users to you.

Just my 2 cents.

Regards,
Lukas

On Thu, Feb 10, 2011 at 12:53 AM, Peter Van Dijck <petervandijck@gmail.com

wrote:

Hey Lukáš,

  • Hoping for millions of users.
  • The kind of search is faceted search: ie. many objects (say, emails)
    are tagged with tags in different facets. Pretty classic faceted browsing
    approach, which is why we are looking at Solr and now elasticsearch.
  • Search suggestions are nice but not a must have right now.

Peter

On Wed, Feb 9, 2011 at 11:44 AM, Lukáš Vlček lukas.vlcek@gmail.comwrote:

Hi,

How many users do you plan to support?
What kind of search do you plan to provide? Do you for example plan to
provide some search suggestions while user types in the search terms or
spell checking?

Regards,
Lukas

On Wed, Feb 9, 2011 at 1:26 PM, Peter Van Dijck <petervandijck@gmail.com

wrote:

Hi,
newbie here. I have a question about a usecase. Evaluating using solr vs
elasticsearch.

The use case is this: we have many users, who each have their own list
of objects they can search. Think Gmail. I should be able to search my own
email, and you should not be able to hack the system to search my email.

I am hoping we can use elasticsearch's JSON api directly (so we don't
have to create our own json wrapper in between), but I'm not sure how
authentication works. Can I authenticate a user so that they can only search
their own email? Or once I authenticate, they can search the entire email
library of everyone?

Thanks for any pointers!

Cheers,
Peter

--
http://petervandijck.com/
http://twitter.com/petervandijck
Skype id: peterkevandijck
USA tel (SkypeIn nr.): (646) 502-8604 <+16465028604>

--
http://petervandijck.com/
http://twitter.com/petervandijck
Skype id: peterkevandijck
USA tel (SkypeIn nr.): (646) 502-8604

--
http://petervandijck.com/
http://twitter.com/petervandijck
Skype id: peterkevandijck
USA tel (SkypeIn nr.): (646) 502-8604


(Clinton Gormley) #10

On Wed, 2011-02-09 at 18:54 -0500, Peter Van Dijck wrote:

Yes, but then couldn't user X just search for user Y's content if they
know user Y's userid?

This depends entirely on your interface. Would you open up your
database to receive absolutely any request from outside?

Obviously, no.

Similarly, you want to sanitise any request coming in to ElasticSearch.
You wouldn't just put your ES server on the web. Too easy for somebody
to send a shutdown or delete-index request.

But any query should pass through your front end and apply your
application specific rules.

For instance, you wouldn't want to accept text queries like:

"foo bar user_id:1234"

But you might want to accept:

"foo bar from:joe"

So your query string should be parsed and sanitised before passing that
query on.

clint


(Peter Van Dijck) #11

But any query should pass through your front end and apply your
application specific rules.

OK, thanks that helps a lot. This may be blindingly obvious to the more
experienced people, but I was thinking of actually exposing the json rest
api to the world. That's why I was asking about authentication per user.

So we need a pass-through, where we handle auth and such.

Thanks again, if I could upvote this answer I would :slight_smile:

Peter

For instance, you wouldn't want to accept text queries like:

"foo bar user_id:1234"

But you might want to accept:

"foo bar from:joe"

So your query string should be parsed and sanitised before passing that
query on.

clint

--
http://petervandijck.com/
http://twitter.com/petervandijck
Skype id: peterkevandijck
USA tel (SkypeIn nr.): (646) 502-8604


(system) #12