Design advice/validation for a complex music streaming service

Hello everyone,

I'm a new user of ES and started to work on using it for fairly complex
project.
This is basically a music streaming service (think Spotify or Deezer) with
a consequent amount of music data and user data.

I am not having too much question/issue about the music data part. I was
thinking about packing everything in a single "catalog" index with the
following types: "artist", "album", "track", "label", "genre", ... Data
duplication doesn't look too much of a problem here I guess: for instance
duplicating artist data in the "artist" document and in every "track"
documents (tracks being performed by the said artist). That said, both
indexation and querying looks pretty simple, performant and fitting our
needs (facets, sorting on any fields, etc).

Where I need your help is about the "user" part.

Every user on our site can:

  • purchase tracks and albums
  • set as favorite tracks, albums and artists
  • build playlist with albums and tracks

I went through the last months of this google group thread, read the ES
documentation and did a few google searches. That said, I'm still not sure
about which (or combination of) design pattern I should go with.
Right now, I was thinking about using parent/child relationships with the
following indices design :

  • A "catalog" index with all the music metadata
  • A "user" index with the following types:
  • favorite_track
  • favorite_album
  • purchase_album
  • playlist_track
  • ...
    Each document would contain a _parent reference to the corresponding
    document in the "catalog" index
    Is this even possible ? (_parent reference to document from an external
    index), Will I be able to do some advanced queries (sort a user favorite
    tracks by album release date) ?

What would be the proper design pattern in term of relationships (flat,
nested objects, parent/child, ...) to meet the following requirements (by
order of preference):

  • fitting our functional needs (a user should be able to search in its
    whole collection of data, sort by music metadata, use facets, etc)
  • query performance/simplicity
  • indexation performance/simplicity: for instance, track document in the
    "catalog" index have property frequently updated (such as play_count), If
    this data is duplicated in thousands of place (think every user who
    purchased, favorited or playlisted the said track) we might have a huge
    indexation issue.
  • server ressources: RAM/storage needed

Any help or advice on my problem would greatly appreciated!
Thanks in advance.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Actually, I read that parent/child are not available across indices and
nested objects would be a nightmare to maintain (when a small track update
could mean re-index of tens of thousands of document).
However, I went through the following upcoming feature
: https://github.com/elasticsearch/elasticsearch/issues/2674
(in v0.90.0.Beta1) which sound pretty interesting for my case.

What do you guys think?

On Friday, March 29, 2013 12:52:19 PM UTC+1, Loris Guignard wrote:

Hello everyone,

I'm a new user of ES and started to work on using it for fairly complex
project.
This is basically a music streaming service (think Spotify or Deezer) with
a consequent amount of music data and user data.

I am not having too much question/issue about the music data part. I was
thinking about packing everything in a single "catalog" index with the
following types: "artist", "album", "track", "label", "genre", ... Data
duplication doesn't look too much of a problem here I guess: for instance
duplicating artist data in the "artist" document and in every "track"
documents (tracks being performed by the said artist). That said, both
indexation and querying looks pretty simple, performant and fitting our
needs (facets, sorting on any fields, etc).

Where I need your help is about the "user" part.

Every user on our site can:

  • purchase tracks and albums
  • set as favorite tracks, albums and artists
  • build playlist with albums and tracks

I went through the last months of this google group thread, read the ES
documentation and did a few google searches. That said, I'm still not sure
about which (or combination of) design pattern I should go with.
Right now, I was thinking about using parent/child relationships with the
following indices design :

  • A "catalog" index with all the music metadata
  • A "user" index with the following types:
  • favorite_track
  • favorite_album
  • purchase_album
  • playlist_track
  • ...
    Each document would contain a _parent reference to the corresponding
    document in the "catalog" index
    Is this even possible ? (_parent reference to document from an external
    index), Will I be able to do some advanced queries (sort a user favorite
    tracks by album release date) ?

What would be the proper design pattern in term of relationships (flat,
nested objects, parent/child, ...) to meet the following requirements (by
order of preference):

  • fitting our functional needs (a user should be able to search in its
    whole collection of data, sort by music metadata, use facets, etc)
  • query performance/simplicity
  • indexation performance/simplicity: for instance, track document in the
    "catalog" index have property frequently updated (such as play_count), If
    this data is duplicated in thousands of place (think every user who
    purchased, favorited or playlisted the said track) we might have a huge
    indexation issue.
  • server ressources: RAM/storage needed

Any help or advice on my problem would greatly appreciated!
Thanks in advance.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

No one has any advice on this topic?
Is ES really the good fit for searching large shared data that also have
user constraint (ie, millions of music tracks that can be added in user
playlists)

On Friday, March 29, 2013 12:52:19 PM UTC+1, Loris Guignard wrote:

Hello everyone,

I'm a new user of ES and started to work on using it for fairly complex
project.
This is basically a music streaming service (think Spotify or Deezer) with
a consequent amount of music data and user data.

I am not having too much question/issue about the music data part. I was
thinking about packing everything in a single "catalog" index with the
following types: "artist", "album", "track", "label", "genre", ... Data
duplication doesn't look too much of a problem here I guess: for instance
duplicating artist data in the "artist" document and in every "track"
documents (tracks being performed by the said artist). That said, both
indexation and querying looks pretty simple, performant and fitting our
needs (facets, sorting on any fields, etc).

Where I need your help is about the "user" part.

Every user on our site can:

  • purchase tracks and albums
  • set as favorite tracks, albums and artists
  • build playlist with albums and tracks

I went through the last months of this google group thread, read the ES
documentation and did a few google searches. That said, I'm still not sure
about which (or combination of) design pattern I should go with.
Right now, I was thinking about using parent/child relationships with the
following indices design :

  • A "catalog" index with all the music metadata
  • A "user" index with the following types:
  • favorite_track
  • favorite_album
  • purchase_album
  • playlist_track
  • ...
    Each document would contain a _parent reference to the corresponding
    document in the "catalog" index
    Is this even possible ? (_parent reference to document from an external
    index), Will I be able to do some advanced queries (sort a user favorite
    tracks by album release date) ?

What would be the proper design pattern in term of relationships (flat,
nested objects, parent/child, ...) to meet the following requirements (by
order of preference):

  • fitting our functional needs (a user should be able to search in its
    whole collection of data, sort by music metadata, use facets, etc)
  • query performance/simplicity
  • indexation performance/simplicity: for instance, track document in the
    "catalog" index have property frequently updated (such as play_count), If
    this data is duplicated in thousands of place (think every user who
    purchased, favorited or playlisted the said track) we might have a huge
    indexation issue.
  • server ressources: RAM/storage needed

Any help or advice on my problem would greatly appreciated!
Thanks in advance.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.