Pagination and real time indexing

Hi guys,

We have the following situation on our hands:

We have 28.000.000 + documents in an index.

There is a background process running that is adding new documents to the index and it is maybe also changing existing documents.

We have a page where the search return 10 results after let's say 20 minutes the hit a "LOAD MORE" button that loads 10 more results below the existing ones based on the same query but with different from-to parameters.

It does seem to occur fairly often that the new documents get ahead of the previously relevant documents and the documents returned in the second batch produce duplicates in the results.

We wish to create a consistent interface.

I've read the guide and the obvious choice would be to use ES scrolls however it is clearly stated that scrolls should never be used to serve user requests for I guess obvious reasons.

Any suggestions hot to keep the result list consistent with the time the actual search was executed?

The scenario:

  1. User executed a search, 10 result are loaded
  2. User hits load more button after 2 hours
  3. The second batch of results should be loaded from the previous query regardless of the changes a made to the background process in the meantime

Thanks,
Peter

I don't see many solutions:

  • Scroll but remember that it will keep segments around for a looooong time if you set the scroll timeout to something like 1 hour. Unless you can detect that the user actually quit the result page so you can fire a cancel scroll request.
  • Do that on client side. Load the first 10 000 docs on your application and keep them there. You'll have to deal with the fact the user wants more results than 10k.

My 2 cents

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.