Of course, sorry. I should have described our setup in more detail from the
start.
We are basically building a traditional booking system, backed by ES. We
query it and rely on facets to allow the user to select a bunch of
"products" (one product == one document) anywhere in the range from 1 to
1,000,000 or so.
Once the booking has been finalized, we need to ensure the ordered products
are unavailable to other users on the specific date specified at the time
of the order. To avoid writing availability stuff back into the index, we
have a Redis store and leverage a custom scorer (native) which dynamically
sets a score reflecting each product's availability on the queried date.
Using min_score we can filter out the unavailable documents. So, we
basically keep availability in Redis and filter ES' hits based on that data.
To speed things up, our ES plugin keeps an in-memory bitmap/bitset
containing the availability. That way we won't have to hit Redis too much
while scoring documents.
Our challenge is to quickly (or as quickly as possible) write the purchased
document ID's into Redis upon order finalization. For that to take place,
we need to iterate the hits and gradually add them to the appropriate list
in Redis. That's currently our one bottleneck causing a big delay.
I am not necessarily looking for "the one true solution" but would love to
hear more on how you guys have solved challenges like this one earlier, if
ever. I would really appreciate your thoughts and ideas re: availability
data on fairly large data sets.
Thanks again.
On Mon, Jan 21, 2013 at 4:58 PM, Jörg Prante joergprante@gmail.com wrote:
Can you please elaborate on "processing the ID field from potentially
millions of records" and "fast-enough experience"? What is your usecase
like?
I am afraid I do not fully understand because I know of no other method
than retrieving the documents by get, multi get, scrolling over a result
set after a scan query, or a simple query.
Best regards,
Jörg
--
--