Stops processing TCP Output - Requires restart of service

Hi all,

We run Logstash across multiple API Windows servers. We have 3 file inputs, 1 TCP output, and a bunch of filters.

Our inputs are:

input {
    file {
        path => [some file path]
        type => "iis"
        start_position => "beginning"
        sincedb_path => [some file path]
    }
    
    file {
        codec => "json"
        path => [some file path]
        type => "application"
        start_position => "beginning"
        sincedb_path => [some file path]
    }

    file {
        path => [some file path]
        type => "system-stats"
        start_position => "beginning"
        sincedb_path => [some file path]
    }
}

Our outputs are:

output {  
    tcp {
        codec => json_lines
        host => [our broker host]
        port => 6379
    }
}

We have noticed Logstash intermittently stops emitting to the TCP output. This seems to happen somewhat randomly across the instances although occasionally it does look like multiple instances stop processing logs at around the same time. The screen shot below shows a Kibana visualization that illustrates these intermittent interruptions (across 4 instances).

The Logstash windows service is still running on the instance, but no output is coming through. Restarting the Windows service usually kicks things off again and we start seeing log events come through in Kibana.

The failing instances have the following as their last output:

[Api Webserver] INFO logstash.agent - Successfully started Logstash API endpoint {:port=>9600}

We are about to embark on some sophisticated system of monitoring that will look for inactivity of an instance and restart the Logstash service if needed. But it feels like we shouldn't have to do that. Is there anything obvious we are missing?

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.