I'm building a forum website in PHP with a fairly standard MySQL layout. The relevant MySQL tables are as follows, with the search-relevant columns noted after the colon for each one...
- threads: thread_title, total "likes/upvotes" count, hashtags, username (who started the thread), category id/name
- thread_comments: comment_body, "likes/upvotes" count, hashtags, username
I also have these MySQL tables. But I'm guessing there's no need to store these in Elasticsearch as documents, the related category/hashtag names would instead be stored with the threads/comments in a denormalized fashion?
- categories: category_name
- hashtags: hashtag_name
There will most likely (or at least to begin with) only be one search box on the site, which will look for threads matching the user's query. I'll have it look in the fields: category_name, thread_title, comment_body for every comment in the thread, hashtags and maybe username. The search page would then list all the matching threads. I'm not sure yet if each comment should be treated as a separate search result, or if threads should be grouped together. I guess that decision is a big part of the data layout questions below?
Once I've learnt how, I'll also implement some scoring formula on the fields that also takes the "likes" into account.
How would you recommend laying out this data in Elasticsearch?
Q1: From my research it seems that just using one "Elasticsearch index" makes sense for this seeing there's really only one search box on the site. Or could there be a reason to segment a single forum website into multiple "Elasticsearch indices"?
Q2: When Elasticsearch is just being used to search website content, is it most common to have one "Elasticsearch index" per website? This seems to be the case from what I've been reading, but curious how people are actually using it with a server that only hosts one website, or multiple websites. This is kind of the same as the question above, although Q1 is more about what you recommend for this forum, and Q2 is about what everyone else normally does.
Q3: Should each thread be a single "Elasticsearch document" with a big array containing all its comments? Or does it make more sense to store each thread comment as a separate "Elasticsearch document". My hunch is to make each comment a separate document, but keen for feedback on how you would do it.
Q4: Does it make sense to have two separate document "types"? i.e. a "thread" type, and a "thread_comment" type? I guess this depends on the question above. Maybe I don't even need two types?
Q5: Any opinions on how many levels deep the nesting should go? Coming from an SQL background I'm used to each "row/document" only having one "level" of fields (the table's columns). So I'm wondering if there are some benefits to doing some nesting in how the records are stored Elasticsearch.
Q6: In terms of pushing the data from MySQL to Elasticsearch: I'm using a standard PHP MVC framework (Phalcon) that has an ORM. So does it make sense to just push each record to Elasticsearch after the model is saved? i.e. some code in $model->afterSave() - Or is there another way you would do it?
Q7: Can you see any need to store the hashtags and categories themselves as Elasticsearch documents? Or just include them with linked threads/comments?
Thanks for any feedback. This is my first project using Elasticsearch, and its a fairly important one, so I've love to hear your thoughts based on your experiences using it.