Window size never grow up

In our service environment,
some of filebeats consume almost 100% of cpu core.

After checking lumberjack protocol link and its tcpdump,
I found that it always sends window size 1

And from below seems that if windowSize become same to maxOkSize, ( ex, both are 1 )
it is never been changed.

Is It will be mitigated by setting TTL? to reset window size to 10

Or is there a plan to implement commented line from the source ? which below

// TODO: use duration until ACK to estimate an ok max window size value

Hello @keyolk
I am a bit suprised that the windows size is always 1, because that would mean that the filebeat has really low trafic or something else is off.

Before we go deeper in the debugging could you provide the following information:

  • version of Filebeat
  • Filebeat configuration?

Thanks

Hi @pierhugues
What I assume is like below,

The filebeat process launched in docker container, with NATed network with 5 of output logstash hosts.
At the begining it has 5 of established connection with logstash.

Single line of log sent to logstash, then window size become 15 from the initial window size 10, And MaxOkWindowSize also become 10.

Several hours later(conntracks tcp established timeout, or logstash idle timeout) without incoming log, connection to logstash is disconnected but socket is still opened.

After getting new single line of log. filebeat fails to send 5 times, windowSize shrinks to 1.
Reopen connection, succeed to send log.
Now windowSize is 1 and maxOkWindowSize is 2.

Again several hours pass, it fails again.
Now windowSize and maxOkWindowSize become 1.
It never be changed without TTL (seems it is available from v 6.0)

We use filebeat version 5.4.2 with below output configuration.

...
output.logstash:
  hosts: ${LOGSTASH_HOSTS}
  loadbalance: true
...
...

I think to solve it ....

  • Enable TTL to refresh window size without code write.
  • Implement TCP Keep-Alive
  • Implement TODO part
  • Try other heuristic workaround ... like
    • add 1 to maxOkWindowSize if it has same value to windowSize

Before changing any code, did you look at upgrading to 6.6.x? we have changed a few things there?

Thanks @pierhugues I checked recent changes,
Seems with default configuration the issue is not be reproduced.
Only with setting "slow_start=true" it can make same situation again.
Is the "slow_start mode" will be deprecated in the future ?

Good to hear that, concerning slow start we do not have plan to deprecated in the short term, but it's not really anymore because Logstash and Beats supports partial ACKs, so LS doesn't need to ACK the full window.

1 Like

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.