hi,
I'm using Filebeat to read a custom log file; every record (line) is sent to Logstash to normalize it, then into Elastic and then into Kibana to visualize it.
Unfortunately, I saw a lot of duplicated events; I know that could be a problem of ACK communication between FB and LS but I cannot find any specific ACK error in FB's logs.
Not every record is duplicated.
Not every record is duplicated the same time. one record could be published twice, another one 3 times (generally no more time).
very often I can find in the FB logs this frame:
2017-05-26T16:39:42+02:00 DBG handle error: read tcp 172.21.20.71:53429->10.0.18.131:5045: wsarecv: An existing connection was forcibly closed by the remote host.
2017-05-26T16:39:42+02:00 DBG closing
2017-05-26T16:39:42+02:00 DBG 0 events out of 143 events sent to logstash. Continue sending
2017-05-26T16:39:42+02:00 DBG close connection
2017-05-26T16:39:42+02:00 ERR Failed to publish events caused by: read tcp 172.21.20.71:53429->10.0.18.131:5045: wsarecv: An existing connection was forcibly closed by the remote host.
2017-05-26T16:39:42+02:00 INFO Error publishing events (retrying): read tcp 172.21.20.71:53429->10.0.18.131:5045: wsarecv: An existing connection was forcibly closed by the remote host.
Filebeat and Logstash are not on the same vlan but let me say a stupid thing: taking in consideration a firewall issue, it should create duplicated records everytime, isn't it?
Generally Filebeat creates duplicates when sending the events is successful, but the ACK for the batch doesn't make it back. My suspicion is that the router closes down the connections on inactivity, that's why not all docs are duplicated.
Yeah, it has nothing to do with the duplicate events. the An existing connection was forcibly closed by the remote host. message indicates that either Logstash is restarting or there's a connectivity issue. Can you double check that LS is not restarting every so often?
Could be possibile I've individuated the issue.
I found a lot of lines as follow:
Remove state for file as file removed: E:\Scripts\Admin\ExtractorCat\outFile_20170531_2.roger
...
...
State removed for E:\Scripts\Admin\ExtractorCat\outFile_20170531_2.roger because of older: 0s
I write every record into a specific file (i.e. outFile_20170531_2.roger). When I have to add records I rename the file into *.rogerTMP to avoid waste of time since FileBeat locks it.
When Filebeat exec a scan, sometimes it cannot find those files so it removes the saved state of original file.
after several days I can say that this issue is not fixed on my side. so I would like to ask you how do you think should be better to feed a file that will be parsed and inspected by Filebeat?
Logstash 5.2 ships with an old logstash-input-beats plugin. The plugins changelog suggest some fixes regarding LS closing connections. Unless you've update the plugin to the most recent version, you can try to update the plugin or Logstash itself.
Apache, Apache Lucene, Apache Hadoop, Hadoop, HDFS and the yellow elephant
logo are trademarks of the
Apache Software Foundation
in the United States and/or other countries.