Indexing Messages in Threads, or emulating grouping with parent-child relationship

George_Karpenkov · May 1, 2013, 7:51am

Hi All,

I'm using elastic search to index messages in a chat system. Messages work
sort of like facebook: users talk to other users, and for a given pair of
users their exists at most one thread. So we have threads with potentially
very many messages, with up to 100 of messages posted per second, which we
would like to index in nearly realtime. Given that Lucene can't really do
updates, indexing threads is out of the question.

Consequently, we are indexing messages. However, we would like to return
the result as "threads".
According to a bunch of searches I did, grouping is not yet possible with
ES and the best thing I can do if I want to, say, get 50 threads for the
search query "hello" my best bet is to fetch ~100, group them myself and
hope that I'll get sufficiently many.

However that does not solve the "relevance" issue: if thread A contains
separate messages "cat" and "dog", I want to see it when I search for "cat
AND dog". Or at least I want it to be more relevant then a thread which
only contains the word "dog".

Another solution I've found is to make "messages" objects children of
"threads" objects and use a has-child filter.
However, that doesn't solve the relevance problem, because as far as I
understand has-child, it will just give me the first thread which satisfies
the requirements.

Any better suggestions? Am I doing something unusual?

Regards,
George

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

aash_dhariya · May 1, 2013, 7:58am

Is it possible to use messages as nested documents of a particular thread?

On Wed, May 1, 2013 at 1:21 PM, George Karpenkov true.cheshire@gmail.comwrote:

Hi All,

I'm using Elasticsearch to index messages in a chat system. Messages work
sort of like facebook: users talk to other users, and for a given pair of
users their exists at most one thread. So we have threads with potentially
very many messages, with up to 100 of messages posted per second, which we
would like to index in nearly realtime. Given that Lucene can't really do
updates, indexing threads is out of the question.

Consequently, we are indexing messages. However, we would like to return
the result as "threads".
According to a bunch of searches I did, grouping is not yet possible with
ES and the best thing I can do if I want to, say, get 50 threads for the
search query "hello" my best bet is to fetch ~100, group them myself and
hope that I'll get sufficiently many.

However that does not solve the "relevance" issue: if thread A contains
separate messages "cat" and "dog", I want to see it when I search for "cat
AND dog". Or at least I want it to be more relevant then a thread which
only contains the word "dog".

Another solution I've found is to make "messages" objects children of
"threads" objects and use a has-child filter.
However, that doesn't solve the relevance problem, because as far as I
understand has-child, it will just give me the first thread which satisfies
the requirements.

Any better suggestions? Am I doing something unusual?

Regards,
George

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Thanks,
Aash

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

George_Karpenkov_2 · May 1, 2013, 8:13am

Unfortunately as far as I understand ES inserting inner document will
require re-indexing of the entire thread.

On Wed, May 1, 2013 at 5:58 PM, aash dhariya aash.discover@gmail.comwrote:

Is it possible to use messages as nested documents of a particular thread?
Elasticsearch Platform — Find real-time answers at scale | Elastic

On Wed, May 1, 2013 at 1:21 PM, George Karpenkov true.cheshire@gmail.comwrote:

Hi All,

I'm using Elasticsearch to index messages in a chat system. Messages
work sort of like facebook: users talk to other users, and for a given pair
of users their exists at most one thread. So we have threads with
potentially very many messages, with up to 100 of messages posted per
second, which we would like to index in nearly realtime. Given that Lucene
can't really do updates, indexing threads is out of the question.

Consequently, we are indexing messages. However, we would like to return
the result as "threads".
According to a bunch of searches I did, grouping is not yet possible with
ES and the best thing I can do if I want to, say, get 50 threads for the
search query "hello" my best bet is to fetch ~100, group them myself and
hope that I'll get sufficiently many.

However that does not solve the "relevance" issue: if thread A contains
separate messages "cat" and "dog", I want to see it when I search for "cat
AND dog". Or at least I want it to be more relevant then a thread which
only contains the word "dog".

Another solution I've found is to make "messages" objects children of
"threads" objects and use a has-child filter.
However, that doesn't solve the relevance problem, because as far as I
understand has-child, it will just give me the first thread which satisfies
the requirements.

Any better suggestions? Am I doing something unusual?

Regards,
George

--
You received this message because you are subscribed to the Google Groups
"elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
Thanks,
Aash

--
You received this message because you are subscribed to a topic in the
Google Groups "elasticsearch" group.
To unsubscribe from this topic, visit
https://groups.google.com/d/topic/elasticsearch/LBwTcyAgiM8/unsubscribe?hl=en-US
.
To unsubscribe from this group and all its topics, send an email to
elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "elasticsearch" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elasticsearch+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Topic		Replies	Views
Parent/Child, trying to find my way without Grouping function Elasticsearch	18	567	July 6, 2017
Using ElasticSearch for indexing forum Elasticsearch	8	788	July 6, 2017
Indexing a forum with elasticsearch Elasticsearch	3	575	July 6, 2017
Datamodeling question about having a single index vs multiple Elasticsearch	1	350	March 28, 2020
Indexing a threaded conversation Elasticsearch	1	413	September 5, 2020

Indexing Messages in Threads, or emulating grouping with parent-child relationship

Related topics