Is ES right approach for real time mail/message system


(zoz) #1

I am building a web email type message system where the system "tags"
messages with "userid_inbox" and other such tags to group the messages
for search and inbox display. Is ES a good solution for this? My
main question is regarding performance of single message/doc near real
time insert/update. Lets say a user moves a message from his inbox to
a trash folder. I would implement this by removing the "userid_inbox"
tag and adding the "userid_trash" tag from the message/doc. The
system would then want to display the new inbox search result ASAP
back to the user with the message removed (i.e. no longer in the
search result because the "userid_inbox" term has been removed).

Is ES a good fit for a system where there is a high rate of insert/
update and a requirement that latency between insert/update and search
be minimal (i.e. 50ms or whatever)?

Is ES near real time indexing ok for UI feedback of index changes
(i.e. real time changes to the users view of a search) or should I be
looking at a different solution that combines a real time component
backed by a ES search for more complicated full text searching when
needed.

This seems like a common problem ie.. message boards, blog etc.

Thanks for any advice,
Zoz


(James Cook) #2

We have implemented a commenting system as part of our web site using ES.
The Near Realtime performance of ES is something you have to work around
when you have the use cases you describe. The typical way to handle this NRT
problem is to call refresh() on the index, prior to performing a query. We
typically get a response from refresh in less than 100 ms under light load.

This may seem like a long time when queries take a few millis, but keep in
mind that this is a delay on a single user's request/response cycle. In the
scheme of things, it is usually quite minor.

Our architecture fronts ES with Hazelcast, a distributed map (IMDB) product.
We never add our objects directly to ES. Instead we push them to Hazelcast,
and Hazelcast stores the object in ES while maintaining a copy in memory.
It's basically a distributed memcache along with many other data structures
and locking support.

With Hazelcast in place, we can perform a union using the newly inserted
item (i.e. comment) and the search result from ES. This approach requires
knowing the id of the newly inserted comment so it can be looked up directly
in Hazelcast. This isn't a problem for us, because the REST call to insert a
comment returns the URL to the comment in the location header, and that URL
returns the id. We could of chosen to return the ID as a response header as
well.

-- jim

On Tue, Nov 2, 2010 at 11:03 AM, Zoz zozofoz@gmail.com wrote:

I am building a web email type message system where the system "tags"
messages with "userid_inbox" and other such tags to group the messages
for search and inbox display. Is ES a good solution for this? My
main question is regarding performance of single message/doc near real
time insert/update. Lets say a user moves a message from his inbox to
a trash folder. I would implement this by removing the "userid_inbox"
tag and adding the "userid_trash" tag from the message/doc. The
system would then want to display the new inbox search result ASAP
back to the user with the message removed (i.e. no longer in the
search result because the "userid_inbox" term has been removed).

Is ES a good fit for a system where there is a high rate of insert/
update and a requirement that latency between insert/update and search
be minimal (i.e. 50ms or whatever)?

Is ES near real time indexing ok for UI feedback of index changes
(i.e. real time changes to the users view of a search) or should I be
looking at a different solution that combines a real time component
backed by a ES search for more complicated full text searching when
needed.

This seems like a common problem ie.. message boards, blog etc.

Thanks for any advice,
Zoz


(zoz) #3

Thanks James for the insight.

With your thoughts and a search through the forum notes I see now that it looks like ES real time search is a kind of batch process happening every 1 sec or so, and that "refresh" is probably too expensive to do for every small update/insert (if the system has very large number of changes etc).

This was very helpful. I think if I can give some immediate front end UI feedback to the user for messages/docs he updates or creates (i.e. remove or add to list in GUI depending on tags) then it doesn't matter how long the back end system takes to merge that change into the indexes for other people to see or a new search.

Thanks,
Zoz


(zoz) #4

After doing a little research I found a good thread in this forum discussing faceted & real time search: http://elasticsearch-users.115913.n3.nabble.com/Greetings-tp195687p195687.html

(looks like Shay is designing ES with an eye toward real time search in the future, i.e. i think he mentioned he did it for Compass before. The Zoie project looks interesting for real time search on top of Lucene)

Thanks Shay for the amazing system you are building,
Zoz


(Shay Banon) #5

Hi,

Just a note, this is a pretty old thread :). ES already has a pretty broad
set of facets built in, all working with the near real time aspect, and will
work the same when full real time will happen... .

-shay.banon

On Wed, Nov 3, 2010 at 8:14 PM, zoz zozofoz@gmail.com wrote:

After doing a little research I found a good thread in this forum
discussing
faceted & real time search:

http://elasticsearch-users.115913.n3.nabble.com/Greetings-tp195687p195687.html

(looks like Shay is designing ES with an eye toward real time search in the
future, i.e. i think he mentioned he did it for Compass before. The Zoie
project looks interesting for real time search on top of Lucene)

Thanks Shay for the amazing system you are building,
Zoz

View this message in context:
http://elasticsearch-users.115913.n3.nabble.com/Is-ES-right-approach-for-real-time-mail-message-system-tp1828493p1836633.html
Sent from the ElasticSearch Users mailing list archive at Nabble.com.


(system) #6