Filebeat stops connecting to Elastic after a while

Dmitriy_Parkhonin · October 3, 2018, 12:10pm

I have a setup with several dozens of filebeats (v.6.2.2) that are connected to Elastic (v.6.2.2 too).
Each filebeat monitors for several dozens of log files.
Some of the beats stop functioning properly sometimes, I can see the following in filebeat log:

|2018-10-03T12:41:23.282+0100|INFO|elasticsearch/client.go:690|Connected to Elasticsearch version 6.2.2|
|---|---|---|---|
|2018-10-03T12:41:23.282+0100|INFO|template/load.go:73|Template already exists and will not be overwritten.|
|2018-10-03T12:41:24.086+0100|ERROR|elasticsearch/client.go:299|Failed to perform any bulk index operations: Post http://lonasdpricu2:9200/_bulk: net/http: HTTP/1.x transport connection broken: write tcp xx.xx.xx.xx:64229->xx.xx.xx.xx:9200: wsasend: An existing connection was forcibly closed by the remote host.|
|2018-10-03T12:41:25.124+0100|ERROR|pipeline/output.go:92|Failed to publish events: Post http://lonasdpricu2:9200/_bulk: net/http: HTTP/1.x transport connection broken: write tcp xx.xx.xx.xx:64229->xx.xx.xx.xx:9200: wsasend: An existing connection was forcibly closed by the remote host.|
|2018-10-03T12:41:25.124+0100|INFO|elasticsearch/client.go:690|Connected to Elasticsearch version 6.2.2|
|2018-10-03T12:41:25.124+0100|INFO|template/load.go:73|Template already exists and will not be overwritten.|
|2018-10-03T12:41:25.947+0100|ERROR|elasticsearch/client.go:299|Failed to perform any bulk index operations: Post http://lonasdpricu2:9200/_bulk: net/http: HTTP/1.x transport connection broken: write tcp xx.xx.xx.xx:64231->xx.xx.xx.xx:9200: wsasend: An existing connection was forcibly closed by the remote host.|
|2018-10-03T12:41:26.987+0100|ERROR|pipeline/output.go:92|Failed to publish events: Post http://lonasdpricu2:9200/_bulk: net/http: HTTP/1.x transport connection broken: write tcp xx.xx.xx.xx:64231->xx.xx.xx.xx:9200: wsasend: An existing connection was forcibly closed by the remote host.|

There are no errors in Elastic log at this time.
The filebeat starts to process logs again only after restart.
Could you please help me with the issue? It is rather annoying to find out that some data is missing and then finding out which filebeat is stuck and then restarting it...

Mario_Castro · October 4, 2018, 7:25am

Hi @Dmitriy_Parkhonin

Which operating system are you using? I'd suggest to update Filebeat to 6.4 anyways, just in case. We had some known errors in Windows OS as you can see here https://github.com/elastic/beats/issues/905

Dmitriy_Parkhonin · October 4, 2018, 3:39pm

Hi @Mario_Castro,
I'm using Windows Server 2008R2 for filebeats, Windows Server R12 for Elastic.
I have upgraded all the stuff to 6.4.2, will keep an eye on filebeats.

Dmitriy_Parkhonin · October 5, 2018, 3:42pm

Hi @Mario_Castro,
The same problem with version 6.4.2:
In filebeat logs:

|2018-10-05T15:59:17.617+0100|INFO|pipeline/output.go:95|Connecting to backoff(elasticsearch(http://lonasdpricu2:9200))|
|2018-10-05T15:59:17.617+0100|INFO|elasticsearch/client.go:712|Connected to Elasticsearch version 6.4.2|
|2018-10-05T15:59:17.617+0100|INFO|template/load.go:129|Template already exists and will not be overwritten.|
|2018-10-05T15:59:17.618+0100|INFO|pipeline/output.go:105|Connection to backoff(elasticsearch(http://lonasdpricu2:9200)) established|
|2018-10-05T15:59:18.603+0100|ERROR|elasticsearch/client.go:317|Failed to perform any bulk index operations: Post http://lonasdpricu2:9200/_bulk: net/http: HTTP/1.x transport connection broken: write tcp xx.xx.xx.xx:56512->xx.xx.xx.xx:9200: wsasend: An existing connection was forcibly closed by the remote host.|
|2018-10-05T15:59:19.605+0100|ERROR|pipeline/output.go:121|Failed to publish events: Post http://lonasdpricu2:9200/_bulk: net/http: HTTP/1.x transport connection broken: write tcp xx.xx.xx.xx:56512->xx.xx.xx.xx:9200: wsasend: An existing connection was forcibly closed by the remote host.|
|2018-10-05T15:59:19.605+0100|INFO|pipeline/output.go:95|Connecting to backoff(elasticsearch(http://lonasdpricu2:9200))|
|2018-10-05T15:59:19.605+0100|INFO|elasticsearch/client.go:712|Connected to Elasticsearch version 6.4.2|
|2018-10-05T15:59:19.605+0100|INFO|template/load.go:129|Template already exists and will not be overwritten.|
|2018-10-05T15:59:19.621+0100|INFO|pipeline/output.go:105|Connection to backoff(elasticsearch(http://lonasdpricu2:9200)) established|
|2018-10-05T15:59:20.454+0100|ERROR|elasticsearch/client.go:317|Failed to perform any bulk index operations: Post http://lonasdpricu2:9200/_bulk: net/http: HTTP/1.x transport connection broken: write tcp xx.xx.xx.xx:56513->xx.xx.xx.xx:9200: wsasend: An existing connection was forcibly closed by the remote host.|

The situation got even worse: I cannot restore some of the filebeats by restarting them.
And there are no errors on ES side.

Maybe there is some restriction on number of connections to Elastic?

Mario_Castro · October 5, 2018, 3:52pm

Ok @Dmitriy_Parkhonin, I really don't have a clue right now about how to proceed so let's go step by step. I have found a user with a similar problem who found a solution. Can you try this? Filebeat on Windows xp forcibly closed by the remote host

At the same time, check the permissions of the files, specially the registry files that filebeat uses. I don't think that you're having problems with it, but just to check.

Confirm us that you aren't using logstash in the middle, just to double check that we shouldn't look there for issues.

Dmitriy_Parkhonin · October 5, 2018, 4:43pm

Hi @Mario_Castro,
The mentioned problem doesn't look similar to me - I don't use LogStash in the middle.
I have switched on traces on Elastic and found the following:

[2018-10-05T17:39:41,385][DEBUG][r.suppressed             ] path: /bad-request, params: {}
java.lang.IllegalArgumentException: text is empty (possibly HTTP/0.9)
	at io.netty.handler.codec.http.HttpVersion.valueOf(HttpVersion.java:63) ~[netty-codec-http-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.handler.codec.http.HttpRequestDecoder.createMessage(HttpRequestDecoder.java:87) ~[netty-codec-http-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.handler.codec.http.HttpObjectDecoder.decode(HttpObjectDecoder.java:219) ~[netty-codec-http-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) ~[netty-codec-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) [netty-codec-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) [netty-codec-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) [netty-handler-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-common-4.1.16.Final.jar:4.1.16.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
[2018-10-05T17:39:41,385][TRACE][o.e.i.b.request          ] [request] Adjusted breaker by [16440] bytes, now [16440]
[2018-10-05T17:39:41,385][TRACE][o.e.i.b.request          ] [request] Adjusted breaker by [-16440] bytes, now [0]
[2018-10-05T17:39:41,385][TRACE][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [lonasdpricu2] channel closed: [id: 0x2add648b, L:/xx.xx.xx.xx:9200 ! R:/xx.xx.xx.xx:34900]

I have found out that this happens when my filebeat instance reads several dozens of files at startup with possibly big amount of data to be passed to Elastic.

Mario_Castro · October 5, 2018, 4:52pm

Uhmmm, "text is empty" message really confuses me but according to your description, filebeat backpressure system should handle potentially large files.

In any case, I'll try tweaking the input parameters in filebeat as described here https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-log.html

Specially haverster_limit and everything related with backoff

Can you also paste your exact filebeat configuration?

Dmitriy_Parkhonin · October 5, 2018, 4:59pm

The filebeat config:

filebeat.inputs:
- type: log
  enabled: false
  paths:
    - /var/log/*.log
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
setup.template.settings:
  index.number_of_shards: 3
setup.kibana:
  host: "lonasdpricu2:5601"
output.elasticsearch:
  hosts: ["lonasdpricu2:9200"]

Plus module config:

module_version: "1.0"

var:
  - name: paths
    default:
      - D:\Logs\ice\uat-uk*\IpvNeo\IpvNeo.Server-*\SessionTraces-IpvNeo.Server-*.log
      - D:\Logs\ice\uat-uk*\SRCalibrator\SRCalibrator-*\SessionTraces-SRCalibrator-*.log 
      - D:\Logs\ice\uat-uk*\Marina\Marina-*\SessionTraces-Marina-*.log 
      - D:\Logs\ice\uat-uk*\Marina\MarinaGridNode-*\SessionTraces-MarinaGridNode-*.log 
      - D:\Logs\ice\prodcopy-uk1\Marina\Marina-*\SessionTraces-Marina-*.log 
      - D:\Logs\ice\prodcopy-uk1\Marina\MarinaGridNode-*\SessionTraces-MarinaGridNode-*.log 

input: config/dprice-messages.yml
ingest_pipeline: ingest/default.json

config/dprice-messages.yml:

type: log
enabled: true
paths:
{{ range $i, $path := .paths }}
 - {{$path}}
{{ end }}
exclude_files: [".gz$", ".zip"]
fields_under_root: true
fields:
  log_type: "messages"
tags: [
  "json"
]
json.keys_under_root: true
close_renamed: true
close_removed: true
close_timeout: 1m
scan_frequency: 15s
close_inactive: 1m
ignore_older: 120h

Dmitriy_Parkhonin · October 10, 2018, 3:52pm

@Mario_Castro, I suppose that the problem is in size of the messages. I try to handle the files with JSON records that may be of 100 MB in size. This big amount of data is produced in burst usually.
I have tried to increase the parameter max_bytes, but I cannot predict the exact number of monitored files (15-20 or even more) and the upper value for the parameter would be rather big.

system · November 7, 2018, 3:52pm

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Filebeat fails to send logs to Elasticsearch Beats filebeat	2	1135	June 20, 2019
Filebeat stopped frequently when sending to Logstash Beats filebeat	5	1508	July 27, 2018
Filebeat stops sending to es eventually, bulk index 400 Beats filebeat	10	4171	July 14, 2017
Filebeat 6.7.0 not pushing logs to Elasticsearch 6.7.0 after upgrade Beats filebeat	6	6100	April 30, 2019
Filebeat stops sending logs with no reason Beats	5	4889	October 11, 2017

Filebeat stops connecting to Elastic after a while

Related topics