Is there a way to determine the timestamp of index consistency

We're designing a system where we are indexing a large number of document and we want the ability to determine a time watermark where we can claim all documents in an index inserted/update before a certain timestamp are included in a search.

Wondering if there is an API call or can we add in a set of "sentinel" documents that we update and then survey at regular interval via search to determine the queryable documents.

It sounds like you want the refresh API. If a refresh is successful then all documents acknowledged before the refresh API was called will be visible to searches.

Based on observations of my use of the refresh API this wouldn't scale.

Our distributed application is ingesting 1000s of updates a second across multiple tenants and in the event a change is made to policy we need to execute a textual search of the prior ingested documents to find exceptions to the rules while applying those rules to the ingestion feed. We're trying to close the window between the enactment of a new policy and the execution of a search against prior indexed data to catch all documents ingested before the policy was applied.

Doesn't scale in what sense? How often are you changing these policies? You'd only need to refresh at each policy change AIUI.

The policy changes are controlled by our customers so it varies from a couple 100 times a day to 1,000s of times a day all while the infrastructure is ingesting millions of document updates a day and as our customer base grows the number of policy changes grow and our regularly running background queries grow.

Now I'll admit the index I use the _refresh API on the most has a 60 sec refresh rate since it has fairly large documents/heavily nested and mainly drives a UI and when I call _refresh it normally takes a couple seconds to return. The indexes I'm trying to ensure consistency on have refresh rates of 500ms - 1s, is the expectation in that case that a call to _refresh returns in milliseconds?

I don't think I'm understanding the problem. 1000 times per day is not quite once per minute, there should be no problem refreshing at that sort of frequency. I also don't follow why it's a problem if it takes a couple of seconds for the refresh to complete.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.