How often a shard is actually refreshing

Hi,

I am trying to better understand how often a shard is refreshed in comparison to the refresh interval.

For some context: We are trying to calculate the indexing lag in our indexing pipeline and refresh interval is sort of the missing piece. We know we have written the data to the cluster in x secs, but it is not yet searchable until refreshed. So if we calculate the lag from our user's perspective, one of the thoughts was to append the refresh interval to the existing p99 indexing lag metric.

As a test, I did the following:

I created 4 indices with one shard each and the following refresh interval 1s, 5s, 30s and 60s. Then ran a worker for 12 minutes (720s) inserting documents and looking at the total refresh count from _stats API refresh statistics:

This is the sample response that we get from _stats API at a shard level.

"refresh": {
                    "external_total": 92491399,
                    "external_total_time_in_millis": 3670820403,
                    "listeners": 0,
                    "total": 93745384,
                    "total_time_in_millis": 3604639145
                },
  • 1s - 188 times
  • 5s - 84 times
  • 30s - 27 times
  • 60s - 15 times

I am a bit surprised about 1s one as I was hoping to see around 500 - 600 if it's refreshed every second. The higher the refresh interval, the more accurate it seems to be. So my questions are:

  • Is there a better metric to see how often a shard is getting refreshed?
  • 1s refresh interval doesn't mean it always will refresh at 1s? There could be some delay? Are other factors matter here?

I believe an optimisation was added a while back, but do not recall the version. If I remember correctly, it skips refreshes if the index is not queried, but a query can trigger a refresh.

Forgot to add, we are on v 7.10.

If I remember correctly, it skips refreshes if the index is not queried, but a query can trigger a refresh.

So there was no querying happening on these indices, but it was still refreshing at some rate. Looking at the code, the schedule refresh should just happen every refresh interval.

That version is very old and has been EOL a long time. I recommend you upgrade to at least to version 7.17.

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.