Hello,
I'm trying to understand the flow of data inside the Elasticsearch and based on what I could gather so far,
- New data is first appended to the tranlsog and an in-memory buffer.
- Optionally doing a refresh (explicitly or implicitly) converts the data in the in-memory buffer into an in-memory segment and makes it available for searching
- Flushing (explicitly or implicitly) will invoke fsync and commits data to the disk ensuing durability + clears translog.
Can anyone help me understand what exactly does "refresh" do to make the data searchable?
Additionally I have performed the following steps
- Created a test index and explicitly disabled refresh by setting refresh_interval to -1.
- Added a new document with ?refresh=false, which then got appended to the translog and no segment was generated in the index location
- Data is not searchable yet as refresh is not invoked so far
- Flushed the test index explicitly which created the segments and commit point on disk + truncated the translog
- Data is still not searchable
- I was only able to search the data after invoking a refresh!
There was no difference in files and sizes before after the last refresh step. So, nothing new is added/modified.
What did refresh do to make the data searchable? Did it just open some Lucene index searcher on the data?
Any help is greatly appreciated!!!