I have a need to add multi-tenant tagging and commenting support to a system whose primary datastore is ES 2.3. There are currently around 5 million public documents in a single index spread across 5 shards in a 2 node cluster. New documents are added every minute, averaging about 2500 per day. Documents themselves are never updated/deleted.
Requirements:
- each tenant must be able to add tags and comments to each document that are only viewable by them
- admins must be able to add global tags and comments to each document that are viewable by all tenants and inlined with the tenant's tags and comments
- comments will not be threaded, only chronological
- queries (described below) must also support inclusion of conditions against the root document fields such as title, author, creation_time, etc
- must maintain fast query response times
- don't allow any one tenant to affect performance of another
- don't disrupt ability to easily horizontally scale
Must have queries:
- get global and tenant specific tags for all documents in as few queries as possible for list page (paged via infinite scroll)
- get all global and tenant specific tags for a single document on detail page
- get paged global and tenant specific comments for a single document ordered by timestamp
- ability to query documents based on [non]existence of global and tenant specific tags
- ability to query documents that don't have any global or tenant specific tags
- ability to query documents that don't have any global or tenant specific comments
- ability to query documents that have a global or tenant specific comment within past X timeframe
- ability to add one or many global or tenant specific tags to a single or list of bulk selected documents with as few queries as possible
- ability to add a global or tenant specific comment to a single document
- ability to autocomplete search amongst all of a specific tenants tags
Nice to have queries:
- ability to query list of most recent tags amongst all tenants or a specific tenant
- ability to query list of most used tags (and document count per) amongst all tenants or a specific tenant within a specified time period (histogram)
- ability to query a list of most recent comments amongst all tenants or a specific tenant
- ability to query number of tags/comments applied amongst all tenants or a specific tenant within a specified time period (histogram)
- ability to bulk delete tags across one or all tenants
Potential options:
- ES Inner Objects - concerns include one tenant affecting performance for others by adding thousands of tags/comments, inability to retrieve document with only a specific tenant's (and global) tags/comments instead of all of them and filtering them out via code
- ES Nested Documents - concerns include adding new tags/comments forces a reindex of the entire root document, which could take a lot of time/resources if many tags/comments exist for many tenants
- ES Parent/Child Types - concerns include memory overhead, potential difficult with some of the nice to have queries identified above
- Denormalization (store tags/comments in another datastore such as MongoDB or Postgres) - concerns include having to query two data stores and merge results in code which could greatly affect performance, especially for list view where I need all tags for all displayed documents
I would appreciate any architectural insights or recommendations based on my needs.