I'm having a problem with the logstash tcp output plugin, the reconnect_interval does not seem to have an affect on the plugin's actual reconnect attempts? I may be misunderstanding what the field is used for exactly but no matter what I set it to I see the following pattern of reconnect attempts if the TCP server I'm trying to connect to is down:
Failed (Connection refused - connect(2) for "10.0.20.158" port 13370). Sleeping for 0.02
Failed (Connection refused - connect(2) for "10.0.20.158" port 13370). Sleeping for 0.04
Failed (Connection refused - connect(2) for "10.0.20.158" port 13370). Sleeping for 0.08
Failed (Connection refused - connect(2) for "10.0.20.158" port 13370). Sleeping for 0.16
Failed (Connection refused - connect(2) for "10.0.20.158" port 13370). Sleeping for 0.32
Failed (Connection refused - connect(2) for "10.0.20.158" port 13370). Sleeping for 0.64
Failed (Connection refused - connect(2) for "10.0.20.158" port 13370). Sleeping for 1.28
Failed (Connection refused - connect(2) for "10.0.20.158" port 13370). Sleeping for 2.0
Failed (Connection refused - connect(2) for "10.0.20.158" port 13370). Sleeping for 2.0
Failed (Connection refused - connect(2) for "10.0.20.158" port 13370). Sleeping for 2.0
That is confusing! What you are seeing is the stud retry handling in the connect method.
The sleep reconnect_interval only happens once connect throws an exception. It will sleep for the reconnect_interval then go back into connect and do the retries with exponential backoff again.
I would expect the backoff to go down to 0.02 again, not stay at 2.0
Thanks for your reply! Well spotted. That's interesting, I've left it running for long periods and it seems to stay on 2.0s indefinitely, and doesn't seem to throw an exception at all. Not sure if that's expected/unexpected behaviour here
Sorry to be harping on about this but I'm just wondering if this would be considered a bug, or a documentation error at least? The docs on the plugin implies that if a connection fails, the plugin will retry based on that field but in my case here, which I'd imagine is somewhat common, the reconnect_interval is seemingly never taken into account?
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.