Data stream rollover & writing documents at pre-rollover date

Hi,

I brand new to the concept of data streams, and I'm trying to understand the concepts & limits behind them. So sorry in advance if my question is a dumb one.

Let's say:

  • I configure a data stream "foo" with rollover of "max age = 1 hour" ; so every hour a new backing index is created ; starting with foo-000001 today at 1:00AM

  • I have 2 applications, APP1 & APP2, on two different servers, continuously logging data in elasticsearch via their respective filebeats, collecting app's log files ; each application writes one log every second.

At 7:00AM, data stream’s write index is foo-000007. and both applications continue to send logs every second.

At 7:58AM, because of a temporary failure of part of my network, filebeat of APP1 is no more able to reach elasticsearch (while APP2 continues to write logs in the data stream).

At 8:00AM, rollover happens and the new data stream’s write index becomes foo-000008 ; APP2 continues to write logs in it while APP1 doesn't.

At 8:05AM, network issue ends. APP1's filebeat starts again to send the logs to elasticsearch. But it starts with logs of 7:58AM.

=> Given

  • the write's backing index of data stream moved in the meanwhile to foo-000008
  • APP2 has already filled the data stream with logs from 8:00AM to 8:05AM
  • data streams are "append-only time series data"

=> will elasticsearch refuse to store the logs sent by filebeat of APP1, with @timestamp between 7:58AM and 8:05AM ? And thus will I loose all logs of APP1 between 7:58AM and 8:05AM ?

Thanks in advance.

Hi @nouknouk Welcome to the community.

Good Question

See Here

No elasticsearch will write the documents into whatever the backing current backing index is. The timestamp of the document written is not gated/checked upon writing ... Yes in general documents end up in a backing index that is relative to the time, but there are times when there are lags disruptions, etc delays in documents being written and they will be written into the current backing index. There is no guarantee (nor actual requirement that the document timestamp matches the backing index rollover timing) There are some "smarts" to help elasticsearch optimize when searching keeping track of some of the min / max timestamps in the backing indices.

When you search the data you will most likely be searching the Data Stream via the Data View with a time filter, again there is logic to optimize the search.. what it searched and what is returned.

With Respect to "Append Only"

Append-only

Data streams are designed for use cases where existing data is rarely, if ever, updated. You cannot send update or deletion requests for existing documents directly to a data stream. Instead, use the update by query and delete by query APIs.

If needed, you can update or delete documents by submitting requests directly to the document’s backing index.

If you frequently update or delete existing time series data, use an index alias with a write index instead of a data stream. See Manage time series data without data streams.

Hope this helps...

hi @stephenb ,

Thank you :grinning:

Ok. The more I was thinking about it, the more it makes sense, as otherwise, the same problem would arise a few milliseconds before & after any rollover.

For sure. Thanks for you answer !

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.