Data model for a multi-tenant tagging/commenting system

I have a need to add multi-tenant tagging and commenting support to a system whose primary datastore is ES 2.3. There are currently around 5 million public documents in a single index spread across 5 shards in a 2 node cluster. New documents are added every minute, averaging about 2500 per day. Documents themselves are never updated/deleted.

Requirements:

  • each tenant must be able to add tags and comments to each document that are only viewable by them
  • admins must be able to add global tags and comments to each document that are viewable by all tenants and inlined with the tenant's tags and comments
  • comments will not be threaded, only chronological
  • queries (described below) must also support inclusion of conditions against the root document fields such as title, author, creation_time, etc
  • must maintain fast query response times
  • don't allow any one tenant to affect performance of another
  • don't disrupt ability to easily horizontally scale

Must have queries:

  • get global and tenant specific tags for all documents in as few queries as possible for list page (paged via infinite scroll)
  • get all global and tenant specific tags for a single document on detail page
  • get paged global and tenant specific comments for a single document ordered by timestamp
  • ability to query documents based on [non]existence of global and tenant specific tags
  • ability to query documents that don't have any global or tenant specific tags
  • ability to query documents that don't have any global or tenant specific comments
  • ability to query documents that have a global or tenant specific comment within past X timeframe
  • ability to add one or many global or tenant specific tags to a single or list of bulk selected documents with as few queries as possible
  • ability to add a global or tenant specific comment to a single document
  • ability to autocomplete search amongst all of a specific tenants tags

Nice to have queries:

  • ability to query list of most recent tags amongst all tenants or a specific tenant
  • ability to query list of most used tags (and document count per) amongst all tenants or a specific tenant within a specified time period (histogram)
  • ability to query a list of most recent comments amongst all tenants or a specific tenant
  • ability to query number of tags/comments applied amongst all tenants or a specific tenant within a specified time period (histogram)
  • ability to bulk delete tags across one or all tenants

Potential options:

  • ES Inner Objects - concerns include one tenant affecting performance for others by adding thousands of tags/comments, inability to retrieve document with only a specific tenant's (and global) tags/comments instead of all of them and filtering them out via code
  • ES Nested Documents - concerns include adding new tags/comments forces a reindex of the entire root document, which could take a lot of time/resources if many tags/comments exist for many tenants
  • ES Parent/Child Types - concerns include memory overhead, potential difficult with some of the nice to have queries identified above
  • Denormalization (store tags/comments in another datastore such as MongoDB or Postgres) - concerns include having to query two data stores and merge results in code which could greatly affect performance, especially for list view where I need all tags for all displayed documents

I would appreciate any architectural insights or recommendations based on my needs.