Design advice/validation for a complex music streaming service

Loris_Guignard · March 29, 2013, 11:52am

Hello everyone,

I'm a new user of ES and started to work on using it for fairly complex
project.
This is basically a music streaming service (think Spotify or Deezer) with
a consequent amount of music data and user data.

I am not having too much question/issue about the music data part. I was
thinking about packing everything in a single "catalog" index with the
following types: "artist", "album", "track", "label", "genre", ... Data
duplication doesn't look too much of a problem here I guess: for instance
duplicating artist data in the "artist" document and in every "track"
documents (tracks being performed by the said artist). That said, both
indexation and querying looks pretty simple, performant and fitting our
needs (facets, sorting on any fields, etc).

Where I need your help is about the "user" part.

Every user on our site can:

purchase tracks and albums
set as favorite tracks, albums and artists
build playlist with albums and tracks

I went through the last months of this google group thread, read the ES
documentation and did a few google searches. That said, I'm still not sure
about which (or combination of) design pattern I should go with.
Right now, I was thinking about using parent/child relationships with the
following indices design :

A "catalog" index with all the music metadata
A "user" index with the following types:

favorite_track
favorite_album
purchase_album
playlist_track
...
Each document would contain a _parent reference to the corresponding
document in the "catalog" index
Is this even possible ? (_parent reference to document from an external
index), Will I be able to do some advanced queries (sort a user favorite
tracks by album release date) ?

What would be the proper design pattern in term of relationships (flat,
nested objects, parent/child, ...) to meet the following requirements (by
order of preference):

fitting our functional needs (a user should be able to search in its
whole collection of data, sort by music metadata, use facets, etc)
query performance/simplicity
indexation performance/simplicity: for instance, track document in the
"catalog" index have property frequently updated (such as play_count), If
this data is duplicated in thousands of place (think every user who
purchased, favorited or playlisted the said track) we might have a huge
indexation issue.
server ressources: RAM/storage needed

Any help or advice on my problem would greatly appreciated!
Thanks in advance.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Loris_Guignard · March 29, 2013, 3:10pm

Actually, I read that parent/child are not available across indices and
nested objects would be a nightmare to maintain (when a small track update
could mean re-index of tens of thousands of document).
However, I went through the following upcoming feature
: Query DSL: Terms filter to allow for terms lookup from another document · Issue #2674 · elastic/elasticsearch · GitHub
(in v0.90.0.Beta1) which sound pretty interesting for my case.

What do you guys think?

On Friday, March 29, 2013 12:52:19 PM UTC+1, Loris Guignard wrote:

Hello everyone,

I'm a new user of ES and started to work on using it for fairly complex
project.
This is basically a music streaming service (think Spotify or Deezer) with
a consequent amount of music data and user data.

I am not having too much question/issue about the music data part. I was
thinking about packing everything in a single "catalog" index with the
following types: "artist", "album", "track", "label", "genre", ... Data
duplication doesn't look too much of a problem here I guess: for instance
duplicating artist data in the "artist" document and in every "track"
documents (tracks being performed by the said artist). That said, both
indexation and querying looks pretty simple, performant and fitting our
needs (facets, sorting on any fields, etc).

Where I need your help is about the "user" part.

Every user on our site can:

purchase tracks and albums

set as favorite tracks, albums and artists

build playlist with albums and tracks

I went through the last months of this google group thread, read the ES
documentation and did a few google searches. That said, I'm still not sure
about which (or combination of) design pattern I should go with.
Right now, I was thinking about using parent/child relationships with the
following indices design :

A "catalog" index with all the music metadata

A "user" index with the following types:

favorite_track

favorite_album

purchase_album

playlist_track

...
Each document would contain a _parent reference to the corresponding
document in the "catalog" index
Is this even possible ? (_parent reference to document from an external
index), Will I be able to do some advanced queries (sort a user favorite
tracks by album release date) ?

What would be the proper design pattern in term of relationships (flat,
nested objects, parent/child, ...) to meet the following requirements (by
order of preference):

fitting our functional needs (a user should be able to search in its
whole collection of data, sort by music metadata, use facets, etc)

query performance/simplicity

indexation performance/simplicity: for instance, track document in the
"catalog" index have property frequently updated (such as play_count), If
this data is duplicated in thousands of place (think every user who
purchased, favorited or playlisted the said track) we might have a huge
indexation issue.

server ressources: RAM/storage needed

Any help or advice on my problem would greatly appreciated!
Thanks in advance.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Loris_Guignard · April 2, 2013, 8:15am

No one has any advice on this topic?
Is ES really the good fit for searching large shared data that also have
user constraint (ie, millions of music tracks that can be added in user
playlists)

On Friday, March 29, 2013 12:52:19 PM UTC+1, Loris Guignard wrote:

Hello everyone,

I'm a new user of ES and started to work on using it for fairly complex
project.
This is basically a music streaming service (think Spotify or Deezer) with
a consequent amount of music data and user data.

I am not having too much question/issue about the music data part. I was
thinking about packing everything in a single "catalog" index with the
following types: "artist", "album", "track", "label", "genre", ... Data
duplication doesn't look too much of a problem here I guess: for instance
duplicating artist data in the "artist" document and in every "track"
documents (tracks being performed by the said artist). That said, both
indexation and querying looks pretty simple, performant and fitting our
needs (facets, sorting on any fields, etc).

Where I need your help is about the "user" part.

Every user on our site can:

purchase tracks and albums

set as favorite tracks, albums and artists

build playlist with albums and tracks

I went through the last months of this google group thread, read the ES
documentation and did a few google searches. That said, I'm still not sure
about which (or combination of) design pattern I should go with.
Right now, I was thinking about using parent/child relationships with the
following indices design :

A "catalog" index with all the music metadata

A "user" index with the following types:

favorite_track

favorite_album

purchase_album

playlist_track

...
Each document would contain a _parent reference to the corresponding
document in the "catalog" index
Is this even possible ? (_parent reference to document from an external
index), Will I be able to do some advanced queries (sort a user favorite
tracks by album release date) ?

What would be the proper design pattern in term of relationships (flat,
nested objects, parent/child, ...) to meet the following requirements (by
order of preference):

fitting our functional needs (a user should be able to search in its
whole collection of data, sort by music metadata, use facets, etc)

query performance/simplicity

indexation performance/simplicity: for instance, track document in the
"catalog" index have property frequently updated (such as play_count), If
this data is duplicated in thousands of place (think every user who
purchased, favorited or playlisted the said track) we might have a huge
indexation issue.

server ressources: RAM/storage needed

Any help or advice on my problem would greatly appreciated!
Thanks in advance.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Advice on design & query Elasticsearch	1	355	June 22, 2019
Complex data query Elasticsearch	2	278	July 6, 2017
Index design for user's activities Elasticsearch	1	399	July 6, 2017
Design question - relationships across indices Elasticsearch	1	317	July 6, 2017
Index design for web activity Elasticsearch	3	419	July 6, 2017

Design advice/validation for a complex music streaming service

Related topics