I have a system similar to Blog Posts, comments, likes for posts and comments. I am considering modeling Blogs in ES and having data sync between Postgres system and ES when documents are updated. The Blog posts have comments and likes for posts and likes for comments.
I don't think there will be more than 50-100 concurrent users for now, but say in future 1000 concurrent users.
Because users can like comments and posts, the comments and posts may get updated frequently. Should I model comments as separate indexes because when I embed them within my posts (denormalization), the posts index will get reindexed every time I update the comments ?
Because comments and posts can have likes, should I model likes as a separate index since updating likes can cause comments and posts to be reindexed on every update ?
One of the reasons for the above question is that I plan on the UI to display the list of blog posts and clicking on a post shows the comments and likes .
I can share about our experience using elastic from 2~3 years with contents like yours (Blogs, comments and like).
We denormalize data and store blog in a blog index, comments and likes in their respective monthly index.
So far we didn't have any trouble or performance issue.
Basically we use the comments and like as logs (Elastic is good for this ), one document for a like with a target_related_id (id of the blog or comment) and target_kind (which can be blog or comment).
After to display we make a get by id to get the blog detail and 1 multi_search (i.e 1 request) for the comments and like (for the blog). You certainly need a third one if you have like on the comments.
For the list i can be more complex depends on the order, as we sort by ranking (some complex formula based on the like comments tags etc...) we get the blog ids from this index then we get details in multi_search.
We also store blogs tags the same way and to get trend we aggregate... this one it depends on your needs.
We also have views count stored the same way as like and comments, that we use for ranking.
So far the only problem is to have to manage ORM object and elastic json together, in our new application we remove database, only running with elastic. It remove some overhead as we use db only for storage but as we already have the data in elastic it become duplicate.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.