Design Help: Searching Users and Friends

Hi Guys,

In our app, we have users and users can have friends (think Facebook, relationship is bi-directional). I would like to be able to:

1- Have a site-wide search for users by name or username

2- Allow each user to search her friends by name or username

What would be the best approach to design this keeping in mind that:

a- A user can have up to 50k friends.

b- Users can change their names and usernames all the time

cheers,

Drew

--

bump :wink:

On Dec 6, 2012, at 3:23 PM, Drew Kutcharian drew@venarc.com wrote:

Hi Guys,

In our app, we have users and users can have friends (think Facebook, relationship is bi-directional). I would like to be able to:

1- Have a site-wide search for users by name or username

2- Allow each user to search her friends by name or username

What would be the best approach to design this keeping in mind that:

a- A user can have up to 50k friends.

b- Users can change their names and usernames all the time

cheers,

Drew

--

--

Elasticsearch (by the nature of Lucene) is a document-store, so any
solution would need to keep that in mind. Data needs to be de-normalized,
since JOINs are not present. You can look up solutions for any
document-store such as MongoDB or CouchDB, the concepts will be the same.

Elasticsearch (once again, due to Lucene) documents are not
easily updateable, so there are few challenges/limitations. Parent-child
documents are one way to update data. Does the friend list really need to
be searchable?

--
Ivan

On Thu, Dec 6, 2012 at 3:23 PM, Drew Kutcharian drew@venarc.com wrote:

Hi Guys,

In our app, we have users and users can have friends (think Facebook,
relationship is bi-directional). I would like to be able to:

1- Have a site-wide search for users by name or username

2- Allow each user to search her friends by name or username

What would be the best approach to design this keeping in mind that:

a- A user can have up to 50k friends.

b- Users can change their names and usernames all the time

cheers,

Drew

--

--

Hi Ivan,

We have already modeled this using Cassandra and it works somewhat OK and we have some limitations because we per user, we have a reversed index of all their friends. The issue with this model is that every time a user changes their username or name, we have to update the indexes of all their friends which we do have a process but from time to time due to failures we end up with out of date records. I wanted to see if this can be modeled more elegantly using ES.

-- Drew

On Dec 7, 2012, at 9:48 AM, Ivan Brusic ivan@brusic.com wrote:

Elasticsearch (by the nature of Lucene) is a document-store, so any solution would need to keep that in mind. Data needs to be de-normalized, since JOINs are not present. You can look up solutions for any document-store such as MongoDB or CouchDB, the concepts will be the same.

Elasticsearch (once again, due to Lucene) documents are not easily updateable, so there are few challenges/limitations. Parent-child documents are one way to update data. Does the friend list really need to be searchable?

--
Ivan

On Thu, Dec 6, 2012 at 3:23 PM, Drew Kutcharian drew@venarc.com wrote:
Hi Guys,

In our app, we have users and users can have friends (think Facebook, relationship is bi-directional). I would like to be able to:

1- Have a site-wide search for users by name or username

2- Allow each user to search her friends by name or username

What would be the best approach to design this keeping in mind that:

a- A user can have up to 50k friends.

b- Users can change their names and usernames all the time

cheers,

Drew

--

--

--

From your description, the entities you operate with are relational by
nature, where the problem of stale entries is quite common as long as there
is no atomic update.

If you can live with stale entries for the task of search, you can continue
to model friends/user documents and user/friends documents, and reindex
often. Maybe with a low-priority housekeeping thread for cleaning up
erroneous documents in the background.

If not, prepare a transactional database with foreign key constraints,
maybe a graph database, and copy the user/friend graph to Elasticsearch on
a regular basis. By using IDs for user entities, you can rename users and
friends in the database as you want.

Best regards,

Jörg

--

Not sure why this conversation turned into RDBMS vs NoSQL. I have already accepted the tradeoffs of using NoSQL and I'm pretty comfortable with the non-relational world and its constraints. My original question was simply about finding the best way to model the problem in ES. I also gave all the constraints since with NoSQL solutions, you need to design your data model for the queries that you are going to run unlike the relational world where ad hoc queries are more common. I'm still waiting on an answer for my original question :wink:

cheers,

Drew

On Dec 7, 2012, at 6:14 PM, Jörg Prante joergprante@gmail.com wrote:

From your description, the entities you operate with are relational by nature, where the problem of stale entries is quite common as long as there is no atomic update.

If you can live with stale entries for the task of search, you can continue to model friends/user documents and user/friends documents, and reindex often. Maybe with a low-priority housekeeping thread for cleaning up erroneous documents in the background.

If not, prepare a transactional database with foreign key constraints, maybe a graph database, and copy the user/friend graph to Elasticsearch on a regular basis. By using IDs for user entities, you can rename users and friends in the database as you want.

Best regards,

Jörg

--

--

Ask n people on how to design this solution and you will get n+1
answers. The problem you face is challenging even for a RDBMS system. I
have seen a few proposed for MongoDB, but MongoDB can easily update a
single document, while ES cannot.

One solution that should work in terms of functionality would be to use
parent child relationships where each friend is a child with their users
id/username/name. I am assuming everyone has a unique id. Any name changes
will execute a top_children query with the users id. Note that I said the
functionality should work, not clue about performances, which is solely
based on volume of documents and updates.

I have not tried the "new" update functionality. Can it work with nested
objects or top_children queries?

--
Ivan

On Sat, Dec 8, 2012 at 1:14 AM, Drew Kutcharian drew@venarc.com wrote:

Not sure why this conversation turned into RDBMS vs NoSQL. I have already
accepted the tradeoffs of using NoSQL and I'm pretty comfortable with the
non-relational world and its constraints. My original question was simply
about finding the best way to model the problem in ES. I also gave all the
constraints since with NoSQL solutions, you need to design your data model
for the queries that you are going to run unlike the relational world where
ad hoc queries are more common. I'm still waiting on an answer for my
original question :wink:

--