We have implemented a commenting system as part of our web site using ES.
The Near Realtime performance of ES is something you have to work around
when you have the use cases you describe. The typical way to handle this NRT
problem is to call refresh() on the index, prior to performing a query. We
typically get a response from refresh in less than 100 ms under light load.
This may seem like a long time when queries take a few millis, but keep in
mind that this is a delay on a single user's request/response cycle. In the
scheme of things, it is usually quite minor.
Our architecture fronts ES with Hazelcast, a distributed map (IMDB) product.
We never add our objects directly to ES. Instead we push them to Hazelcast,
and Hazelcast stores the object in ES while maintaining a copy in memory.
It's basically a distributed memcache along with many other data structures
and locking support.
With Hazelcast in place, we can perform a union using the newly inserted
item (i.e. comment) and the search result from ES. This approach requires
knowing the id of the newly inserted comment so it can be looked up directly
in Hazelcast. This isn't a problem for us, because the REST call to insert a
comment returns the URL to the comment in the location header, and that URL
returns the id. We could of chosen to return the ID as a response header as
well.
-- jim
On Tue, Nov 2, 2010 at 11:03 AM, Zoz zozofoz@gmail.com wrote:
I am building a web email type message system where the system "tags"
messages with "userid_inbox" and other such tags to group the messages
for search and inbox display. Is ES a good solution for this? My
main question is regarding performance of single message/doc near real
time insert/update. Lets say a user moves a message from his inbox to
a trash folder. I would implement this by removing the "userid_inbox"
tag and adding the "userid_trash" tag from the message/doc. The
system would then want to display the new inbox search result ASAP
back to the user with the message removed (i.e. no longer in the
search result because the "userid_inbox" term has been removed).
Is ES a good fit for a system where there is a high rate of insert/
update and a requirement that latency between insert/update and search
be minimal (i.e. 50ms or whatever)?
Is ES near real time indexing ok for UI feedback of index changes
(i.e. real time changes to the users view of a search) or should I be
looking at a different solution that combines a real time component
backed by a ES search for more complicated full text searching when
needed.
This seems like a common problem ie.. message boards, blog etc.
Thanks for any advice,
Zoz