The background is that documents get into a queue before being indexed.
A document will be versioned first. At that time, it is placed into a RAM buffer which is a tiny Lucene RAM-only segment. When Lucene receives this document, if can offer rudimentary retrieval operations by a doc ID. Elasticsearch just re-uses this Lucene near-real-time feature for providing this as an Elasticsearch feature which works for all shards of a distributed index.
Search comes later. I skip the "translog" story (this can be seen as a write-ahead log file). From the RAM buffer, the document is analyzed and tokenized and filtered etc. to generate a token stream from the document which finally enters a dictionary data structure on disk. This inverted index is synced from time to time to disk, making the document searchable. Each sync step produces a segment. Because this is an I/O expensive operation, Elasticsearch flushes this buffer every 1s only, in order to batch as many docs as possible into a new Lucene segment. So, you can search for terms in the document not instantly, but only after a delay, which saves a lot of expensive I/O costs.
There is a manual operation called
_refresh which combines the necessary steps to clear the RAM buffers, write all outstanding data to disk, and switches the Lucene state to the latest segments. This operation should not executed manually too often, because it interferes with the automatic refreshing and adds only extra load.