What happens during Elasticsearch refresh?

Hello,

I'm trying to understand the flow of data inside the Elasticsearch and based on what I could gather so far,

  1. New data is first appended to the tranlsog and an in-memory buffer.
  2. Optionally doing a refresh (explicitly or implicitly) converts the data in the in-memory buffer into an in-memory segment and makes it available for searching
  3. Flushing (explicitly or implicitly) will invoke fsync and commits data to the disk ensuing durability + clears translog.

Can anyone help me understand what exactly does "refresh" do to make the data searchable?

Additionally I have performed the following steps

  1. Created a test index and explicitly disabled refresh by setting refresh_interval to -1.
  2. Added a new document with ?refresh=false, which then got appended to the translog and no segment was generated in the index location
  3. Data is not searchable yet as refresh is not invoked so far
  4. Flushed the test index explicitly which created the segments and commit point on disk + truncated the translog
  5. Data is still not searchable
  6. I was only able to search the data after invoking a refresh!
    There was no difference in files and sizes before after the last refresh step. So, nothing new is added/modified.

What did refresh do to make the data searchable? Did it just open some Lucene index searcher on the data?

Any help is greatly appreciated!!!

11 years old but I think it is still valid.

1 Like

Hi, Stephen

Thank you for sharing this reference.

So a refresh will open an Index reader/searcher that facilitates search on the data present at that point of time, and every subsequent refresh would reopen the readers every time!