Will we miss documents?
Filebeat employs infinite retry. Internally it drops no events.
But as always, it depends. If the output is down, no events can be published and filebeat will be blocked. Once buffers are full, no new lines are read from you log files. If kafka/filebeat is in this non-processing state for too long and file-rotation kicks in, two things can happen (depending on configs): a) missing documents, because original files have been deleted b) filebeat keeps files open (files are only deleted for real when no more process accesses said file), eventually allowing the system to run out of disk space.
That is, your disk acts like a queue. It's up to you if you want to allow to drop events or potentially run out of disk space. Depending on sizing and log frequency, this can after days or within minuts.
Plus, If kafka becomes available again, filebeat might have quite a backlog of data to publish -> increased CPU, disk, network usage. If these are not increased, filebeat+kafka have already be running at it's limit.
Will the exact amount of documents be indexed?
Will more then the expect amount of documents be indexed (NOT checking for duplicates for now)?
Beats retry publishing events, until kafka did ACK the event being processed. If kafka did not ACK or we're facing network errors, beats have no idea wether events have been processed by kafka or not -> send same event again -> duplicates.