Hi, I'm new to Elasticsearch and struggling with the design.
I have 2 indices. One is 'comment' index which contains comments users wrote, the other is 'ex_comment' which also contains comments but crawling data.
What I want to make is this.
Like 'Facebook like', I want to show how many users like of each comment.
Since the user should know if the user hit like on each comment or not,
Just adding 'counter' field is not enough, and I thought there should be the mapping tables as below.
(index)
like_comment, like_ex_comment
(fields - both are same)
"comment_id" // id from parent index
"user_id" // user who hits like on the comment
These two indices will be childs of each parent indices.
I've heard that Elasticsearch can't have multiple parents. So I thought I need 2 indices respectively.
And I'll be using 'has-child' query if I decide to use this design.
I'm not sure if this would be the best design for managing 'likes'. I'm also worried about the performance since 'hit like' will be frequently executed.
Hi Jenny,
Much depends on the types of query you want to do.
If you only need to know if a particular user has liked a particular comment (e.g. to colour a comment's heart icon accordingly when rendering a web page) then you only need a regular index with comment_id and user_id and query that alone.
If you want to get fancy and query properties of both comments and users in the same request (e.g. find users from London who liked comments about cheese) then you'll need something more either by denormalising data or using parent/child routing of related data to the same machines.
Best to consider the questions you want to ask of the data up-front
No problem. Glad it was useful.
Some times it's the simple things that are hard. Makes me marvel at how things like Twitter scale, counting likes for all tweets and remembering which tweets we liked, going back years.
Hi, as you suggested a design for the 'like comments', I could make it perfectly as I expected.
But recently I've got stuck on another problem, and It's hard for me to figure out.
User comments also need to be shown in descending order by user likes.
As you suggested earlier, I made 'likes_index' to save the user likes.
I queried 'comment_index/_search', 'likes_index/_search' respectively, and then I merged the results to show the result as below. (fetch 10 rows per page)
Comment1
OO liked this comment
Comment2
OO liked this comment
..
But how can I get the result in descending order? It's quite easy when I use RDB, but it's hard for me to think..
I'm not sure if I should think of another design or it can be solved by the current design.
When I googled it, I've seen about 'denormalization', but I'm unsure that I need this here.
If you don't mind, could you give me some ideas that I can solve this problem?
That should happen naturally in the example I gave. Terms are sorted by doc_count descending so the more "liked" docs for a comment in the likes index the higher it appears in the array of top comment terms in the results. I amended my gist with extra like docs to show this effect.
It's also possible to sort aggregation terms by other things like max date (to get the latest first) but the default sort order of number of docs should be working in your favour here.
As far as I understand, using sort aggregation terms can be possible only with the fields of 'likes_index'. Is it right?
Because I've heard that I cannot do join with elasticsearch. And what I might need is to sort index with fields from comment index.
For now, I only need to search for on 'title' of comments.
ex) user search 'love' --> fetch comment data, in descending order by user likes.
So I think just adding a 'title' field to likes_index could be the possible answer in order to search for comments which include specific keywords.
I guess this is the best way to think of, but please, tell me if there are better options.
And in the near future, I might need more functions, such as elasticsearch query score.(on comment_index)
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.