Indexing a forum with elasticsearch


(vibrant) #1

Hello,

I'd like to ask what is the best practice for indexing forum threads. My
problem is that a thread could have 200 posts, and if there is one document
per each thread in the index, then when somebody adds post number 201 - I
have to reindex all 201 posts.

Is there any best practice for this with elastic search?

Thanks,


(David Pilato) #2

Hi,

IMO, I always ask myself : "what I am looking for ?".
In your case, "Am I looking for Threads talking about something ?" or "Am I
looking for comments talking about something ?".

This could be important in term of scoring.

Imagine the following case :
I have 2 threads :
The first one has 10 comments with the word elasticsearch in each one
The second one has 1 comment with the word elasticsearch

If you index individuals comments, the score of the 11 comments will be the
same.
If you index threads, the score of the first thread will be higher than the
second one.

That said, I will index threads with comments because I'm looking for threads
talking about what I m searching for and I want to find the more relevant item.

If you want to find a Thread talking about elasticsearch AND lucene, you will
find it if you index the whole Thread even if the two words are in individual
comments. But you won't find it if you index only comments.

Whatever you choose : using array of comments or nested comments, you will have
to send the full Thread to Elastic search.

That's the way I would certainly handle your use case.
HTH
David.

--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet

Le 21 novembre 2011 à 10:39, vibrant paranite@gmail.com a écrit :

Hello,

I'd like to ask what is the best practice for indexing forum threads. My
problem is that a thread could have 200 posts, and if there is one document
per each thread in the index, then when somebody adds post number 201 - I
have to reindex all 201 posts.

Is there any best practice for this with elastic search?

Thanks,


(vineeth mohan) #3

May be you can use parent child documents -
http://www.elasticsearch.org/guide/reference/mapping/parent-field.html

Thanks
Vineeth

On Mon, Nov 21, 2011 at 4:02 PM, david@pilato.fr david@pilato.fr wrote:

**

Hi,

IMO, I always ask myself : "what I am looking for ?".

In your case, "Am I looking for Threads talking about something ?" or "Am
I looking for comments talking about something ?".

This could be important in term of scoring.

Imagine the following case :

I have 2 threads :

The first one has 10 comments with the word elasticsearch in each one

The second one has 1 comment with the word elasticsearch

If you index individuals comments, the score of the 11 comments will be
the same.

If you index threads, the score of the first thread will be higher than
the second one.

That said, I will index threads with comments because I'm looking for
threads talking about what I m searching for and I want to find the more
relevant item.

If you want to find a Thread talking about elasticsearch AND lucene, you
will find it if you index the whole Thread even if the two words are in
individual comments. But you won't find it if you index only comments.

Whatever you choose : using array of comments or nested comments, you will
have to send the full Thread to Elastic search.

That's the way I would certainly handle your use case.

HTH

David.

--
David Pilato
http://dev.david.pilato.fr/
Twitter : @dadoonet

Le 21 novembre 2011 à 10:39, vibrant paranite@gmail.com a écrit :

Hello,

I'd like to ask what is the best practice for indexing forum threads.
My
problem is that a thread could have 200 posts, and if there is one
document
per each thread in the index, then when somebody adds post number 201 -
I
have to reindex all 201 posts.

Is there any best practice for this with elastic search?

Thanks,


(system) #4