I take that back, I looked at the other thread where I noted an issue with Filebeat (the thread wasn't specifically about Filebeat but another Beat I was writing) and this was the relevant snippet:
"Maybe this helps - in my Filebeat debug output, it notes All prospectors are initialized and running with 2043 states to persist. The failure occurs at file 1017. My ulimit was 1024. I went ahead and set this to 8192 and tried the test again - file open errors are gone, but the result is the same. After file 1020, I get:"
Having solved my issue with the Beat I was working on and looking at the debug output again, the errors were all related to the issue I had with my Beat. So Nevermind.
I'll give the harvester_limit a try.
Right I agree about timestamps, I would never assume logs come in in any precise order. But the situation I'm trying to avoid is where a queue of 3 days of logs are being harvested, and the data received is bouncing between different days and hours. I know that ultimately this is a cosmetic issue, but if presenting this stream of logs to a user whether via Kibana or Graylog, it looks wrong if a batch of events from 2017-10-20 16:00:00 show up, then 2017-10-19 23:30:00, then 2017-10-21 03:10:00, etc.
Another situation which is less cosmetic is if there is a best effort analysis being done on a stream of events where 1 specific event needs to be flagged first, and then a related following event is flagged (e.g. a request to a specific website, followed by a response). These 2 events would be in the same logs, logged in order, but maybe broken up between 2 different files due to log rotation.
Another situation which is order dependent is when timestamps are used as a trigger for a batch job. For example, some kind of report is generated for 2017-10-22 when events start coming in for 2017-10-23. If there is a connection problem on the sending host for a few days, and it recovers, if the harvesters send 3 days of logs in an un-ordered manner, ES gets a batch of events for the 3rd which prematurely triggers the batch jobs for the previous 2 days.
Another situation has to do with expiry. Let's say Filebeat is set to only harvest 48 hours of logs, and the connection is over very limited bandwidth. There is a spike in activity, causing a backlog that takes a few days to recover from. As the log files get older, they start to fall out of the harvest age limit. Rather than pseudorandom reception of events, we end up with pseudorandom gaps in data.