Didn't read the code, but this makes me wonder about NiFi internal queues blocking:
Thank goodness otherwise you may not have replied.
Partial ACKs are fine. The effect on beats is, if connection is dropped due to timeout, the subset of events already been ACKed will not be resend. This not-resending is only true for the sending pipeline though. As filebeat operates in batches, the ACK will not be forwarded to the filebeat registry. That is, if filebeat is restarted in middle of failing connection, the complete batch is retransferred.
Interesting. I wasn't aware about the ACKing being restricted to the sending side only, but I guess I rather receive at-least one than missing the data.
Also, to certain extent I suspect this can be described as beyond the scope of the protocol, being a particular design choice for filebeat, would you agree?
Partial ACK (even ACK 0) has a second function too. It is a heartbeat signal as well. Beats have a default connection idle-timeout of 30 seconds. In case of logstash pipeline being blocked, the beats-input-plugin sends an ACK every 5 seconds. This prevents beats from reconnecting and sending the batch once again.
I will have to double check but shouldn't be impossible to mimic beats-input-plugin.
e.g. what happens if NiFi queue is currently stopped? Without ACK, beats will eventually timeout the connection as no ACK has been received and resend the complete batch (unnecessarily using network resources + potentially duplicating events due to resends).
If I understand your example correctly impact should in theory be limited.
The processor (the NiFi terminology for this "plugin") should ACK the event as soon data is persisted into disk (i.e. a new flowfile is created).
In theory, the downstream pipeline could grind to a halt without prejudice to the sending agent, instead, the processor should still make sure events are queued in accordance to "queue" settings (i.e. kept until expired due to age or until exceeding the queue capacity).
So, in the hypothetical situation where a NiFi downstream flow is totally stopped - e.g. HDFS cluster is offline - the queue (or connection in NiFi lingo) should continue to accept new events until it grows too large (# of bytes) or has too many events. Only then ACKs should no longer be ACKed.
Did that make sense?