Hey,
Thanks for the followup and sorry for not being totally clear. Here
is an explicit example:
A user saves a search for "elasticsearch AND cool" to be alerted on.
Every time a new document comes in, for every user query registered,
we run against that doc, so a new doc id of 1 comes in, we end up
with:
(elasticsearch AND cool) AND (_id:1)
I'm considering wrapping each part in a filter query for the least
overhead. At the moment these are two separate querystrings, but it'd
be easy to update on my side to make the _id match a termquery.
Like I said, not efficient, but simple to understand and work with.
If it matches we fire an alert.
I believe that any of the alerting should be done from outside the ES
realm and think (I really haven't looked, we have some custom setup)
the alerting could be handled with a RabbitMQ setup or similar.
What would be cool, but I have no idea the feasibility of would to:
- Register queries against an index
- When a document is indexed have details in the return citing which
queries the document matched against
- The application logic then handles any alerting or other actions
Thanks,
Paul
On Nov 4, 1:01 pm, Shay Banon shay.ba...@elasticsearch.com wrote:
Hey,
On Thu, Nov 4, 2010 at 5:36 PM, Paul ppea...@gmail.com wrote:
Hello,
We have a system where we check every new document that comes in
against a large number of saved users queries. We do this, now, by
running the <USER_QUERY> AND . It isn't the most
efficient setup, since for every doc, we need to run every saved user
query.
What is exactly the NEWDOCUMENT part? How do you create it?
So, I there are two questions:
- Within ES, is there a different approach that could be taken to
streamline?
There are different ways to implement this internally, certainly with lower
overhead than doing what you are doing now. One of the main problems here
is the messaging aspect, which have different additional features. Can
messages be lost? are duplicates allowed? If the client is down (registered
to receive notifications), do docs need to be replayed for him? And, of
course, actually implementing the notification aspect protocol wise.... .
- When running the USER_QUERY AND NEWDOC query, what kind of caching
makes the most sense? From my limited understanding, I think it'd make
sense to filtercache both the USERQUERY and the NEWDOC query, since
both of these should get repeated multiple times.
Depends on what the answer for NEWDOC is...., is it a term query with the
doc id?
Thanks!
Paul