Using ElasticSearch for indexing forum

Hello,

I would like to use ElasticSearch to index a forum database
(containing about 8 million messages) and I'm not sure how to index/
query it properly.

I'm planning to index a document that would represent a single mesage
on the forum. It would also have properties like ForumId, TopicId,
TopicName, Subject, Author, Date Posted.

If I need to search for a list of messages it will work perfectly
well, but I need to retrieve a list of Topics. Basically do a grouping
on TopicId (and sort on date posted or score). Is this something I
can achieve with ElasticSearch (and how?) .
Could nested objects help in this case?

Many thanks, S3ncha

Anyone -i s it a bad idea to use ElasticSearch in this case? I really like
the product and functionality it provides, but having trouble figuring out
if it's going to be efficient for in this scenario (indexing forum -
searching for messages and returning unique topics)?

Any help would be greatly appreciated.
Thanks!

--
View this message in context: http://elasticsearch-users.115913.n3.nabble.com/Using-ElasticSearch-for-indexing-forum-tp3273421p3279233.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.

Anyone - is it a bad idea to use ElasticSearch in this case?
I really like the product and functionality it provides, but having
trouble figuring out if it's going to be
efficient for in this scenario (indexing forum - searching for
messages and returning unique topics)?

Any help would be greatly appreciated.
Thanks!

We implement a threaded forum using elasticsearch to hold data. What's your
concern?

-- jim

Hello James,

I would like to use elasticsearch purely for the searching purpose and
continue to using another database as main db that holds forum data.
Are you storing individual message or whole thread (containing all
messages) as document in the elastic search? I'm planning to store
messages as documents,

I would like to implement 2 different search options on the forum:

  1. search that would return message snippets.
  2. search that would return list of threads

How can I implement the 2nd option? As I understand elasticsearch
doesn't support field collapsing yet, so I cant group documents by
thread id when doing this search.

Alex

I am using ES as the data store also, but that shouldn't impact the
searchability you seek.

Your current data structure, probably contains a unique id for each message
and a parent id. I would encourage you to also include a thread id. You
would index into ES a document with a structure like this:

{
message_id: ,
parent_id: , // not really helpful for your use case
thread_id: ,
message:
}

If you want to search by author, sort by date, etc. you will need to include
those fields.

If you want to provide a snippet of the message and highlight text that
corresponds to the user's search terms (ala google and most good forum
searches) you will "store" the message in the index. If you just want a list
of message_id's back, you do not need to store the "message" itself.

If you do a search, and the same thread id appears in multiple results, but
you only want the thread id to appear one time, that is what faceting is
used for. Someone else with experience using facets will have to weigh in
here.

Thanks for reply.

This is pretty much how I intended to store the documents.
I did look at the facets, but if I use following structure

{
message_id: ,
thread_id: ,
message:
posted_date :
}

I don't think I will be able to search the index and return list of
thread_id (facet on thread_id) sorted by posted_date.
I'm I missing something? I can't find any example using facets and
sorting.
Also I need to return a min(message_id) for each thread_id.

In SQL(tsql for example) it would look like

select thread_id, min(message_id)
from messages
where contains(message,'keyword')
order by thread_id

Can I do something similar with ES?

Thanks

sorry, just realised that I didn't include group by in my sql example.
it should be :

select thread_id, min(message_id)
from messages
where contains(message,'keyword')
group by(thread_id)
order by thread_id