Filebeat stops connecting to Elastic after a while


(Dmitriy Parkhonin) #1

I have a setup with several dozens of filebeats (v.6.2.2) that are connected to Elastic (v.6.2.2 too).
Each filebeat monitors for several dozens of log files.
Some of the beats stop functioning properly sometimes, I can see the following in filebeat log:

|2018-10-03T12:41:23.282+0100|INFO|elasticsearch/client.go:690|Connected to Elasticsearch version 6.2.2|
|---|---|---|---|
|2018-10-03T12:41:23.282+0100|INFO|template/load.go:73|Template already exists and will not be overwritten.|
|2018-10-03T12:41:24.086+0100|ERROR|elasticsearch/client.go:299|Failed to perform any bulk index operations: Post http://lonasdpricu2:9200/_bulk: net/http: HTTP/1.x transport connection broken: write tcp xx.xx.xx.xx:64229->xx.xx.xx.xx:9200: wsasend: An existing connection was forcibly closed by the remote host.|
|2018-10-03T12:41:25.124+0100|ERROR|pipeline/output.go:92|Failed to publish events: Post http://lonasdpricu2:9200/_bulk: net/http: HTTP/1.x transport connection broken: write tcp xx.xx.xx.xx:64229->xx.xx.xx.xx:9200: wsasend: An existing connection was forcibly closed by the remote host.|
|2018-10-03T12:41:25.124+0100|INFO|elasticsearch/client.go:690|Connected to Elasticsearch version 6.2.2|
|2018-10-03T12:41:25.124+0100|INFO|template/load.go:73|Template already exists and will not be overwritten.|
|2018-10-03T12:41:25.947+0100|ERROR|elasticsearch/client.go:299|Failed to perform any bulk index operations: Post http://lonasdpricu2:9200/_bulk: net/http: HTTP/1.x transport connection broken: write tcp xx.xx.xx.xx:64231->xx.xx.xx.xx:9200: wsasend: An existing connection was forcibly closed by the remote host.|
|2018-10-03T12:41:26.987+0100|ERROR|pipeline/output.go:92|Failed to publish events: Post http://lonasdpricu2:9200/_bulk: net/http: HTTP/1.x transport connection broken: write tcp xx.xx.xx.xx:64231->xx.xx.xx.xx:9200: wsasend: An existing connection was forcibly closed by the remote host.|

There are no errors in Elastic log at this time.
The filebeat starts to process logs again only after restart.
Could you please help me with the issue? It is rather annoying to find out that some data is missing and then finding out which filebeat is stuck and then restarting it...


(Mario Castro) #2

Hi @Dmitriy_Parkhonin

Which operating system are you using? I'd suggest to update Filebeat to 6.4 anyways, just in case. We had some known errors in Windows OS as you can see here https://github.com/elastic/beats/issues/905


(Dmitriy Parkhonin) #3

Hi @Mario_Castro,
I'm using Windows Server 2008R2 for filebeats, Windows Server R12 for Elastic.
I have upgraded all the stuff to 6.4.2, will keep an eye on filebeats.


(Dmitriy Parkhonin) #4

Hi @Mario_Castro,
The same problem with version 6.4.2:
In filebeat logs:

|2018-10-05T15:59:17.617+0100|INFO|pipeline/output.go:95|Connecting to backoff(elasticsearch(http://lonasdpricu2:9200))|
|2018-10-05T15:59:17.617+0100|INFO|elasticsearch/client.go:712|Connected to Elasticsearch version 6.4.2|
|2018-10-05T15:59:17.617+0100|INFO|template/load.go:129|Template already exists and will not be overwritten.|
|2018-10-05T15:59:17.618+0100|INFO|pipeline/output.go:105|Connection to backoff(elasticsearch(http://lonasdpricu2:9200)) established|
|2018-10-05T15:59:18.603+0100|ERROR|elasticsearch/client.go:317|Failed to perform any bulk index operations: Post http://lonasdpricu2:9200/_bulk: net/http: HTTP/1.x transport connection broken: write tcp xx.xx.xx.xx:56512->xx.xx.xx.xx:9200: wsasend: An existing connection was forcibly closed by the remote host.|
|2018-10-05T15:59:19.605+0100|ERROR|pipeline/output.go:121|Failed to publish events: Post http://lonasdpricu2:9200/_bulk: net/http: HTTP/1.x transport connection broken: write tcp xx.xx.xx.xx:56512->xx.xx.xx.xx:9200: wsasend: An existing connection was forcibly closed by the remote host.|
|2018-10-05T15:59:19.605+0100|INFO|pipeline/output.go:95|Connecting to backoff(elasticsearch(http://lonasdpricu2:9200))|
|2018-10-05T15:59:19.605+0100|INFO|elasticsearch/client.go:712|Connected to Elasticsearch version 6.4.2|
|2018-10-05T15:59:19.605+0100|INFO|template/load.go:129|Template already exists and will not be overwritten.|
|2018-10-05T15:59:19.621+0100|INFO|pipeline/output.go:105|Connection to backoff(elasticsearch(http://lonasdpricu2:9200)) established|
|2018-10-05T15:59:20.454+0100|ERROR|elasticsearch/client.go:317|Failed to perform any bulk index operations: Post http://lonasdpricu2:9200/_bulk: net/http: HTTP/1.x transport connection broken: write tcp xx.xx.xx.xx:56513->xx.xx.xx.xx:9200: wsasend: An existing connection was forcibly closed by the remote host.|

The situation got even worse: I cannot restore some of the filebeats by restarting them.
And there are no errors on ES side.

Maybe there is some restriction on number of connections to Elastic?


(Mario Castro) #5

Ok @Dmitriy_Parkhonin, I really don't have a clue right now about how to proceed so let's go step by step. I have found a user with a similar problem who found a solution. Can you try this? Filebeat on Windows xp forcibly closed by the remote host

At the same time, check the permissions of the files, specially the registry files that filebeat uses. I don't think that you're having problems with it, but just to check.

Confirm us that you aren't using logstash in the middle, just to double check that we shouldn't look there for issues.


(Dmitriy Parkhonin) #6

Hi @Mario_Castro,
The mentioned problem doesn't look similar to me - I don't use LogStash in the middle.
I have switched on traces on Elastic and found the following:

[2018-10-05T17:39:41,385][DEBUG][r.suppressed             ] path: /bad-request, params: {}
java.lang.IllegalArgumentException: text is empty (possibly HTTP/0.9)
	at io.netty.handler.codec.http.HttpVersion.valueOf(HttpVersion.java:63) ~[netty-codec-http-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.handler.codec.http.HttpRequestDecoder.createMessage(HttpRequestDecoder.java:87) ~[netty-codec-http-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.handler.codec.http.HttpObjectDecoder.decode(HttpObjectDecoder.java:219) ~[netty-codec-http-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:489) ~[netty-codec-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:428) [netty-codec-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:265) [netty-codec-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286) [netty-handler-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.ChannelInboundHandlerAdapter.channelRead(ChannelInboundHandlerAdapter.java:86) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:134) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:545) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:499) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) [netty-transport-4.1.16.Final.jar:4.1.16.Final]
	at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) [netty-common-4.1.16.Final.jar:4.1.16.Final]
	at java.lang.Thread.run(Thread.java:748) [?:1.8.0_181]
[2018-10-05T17:39:41,385][TRACE][o.e.i.b.request          ] [request] Adjusted breaker by [16440] bytes, now [16440]
[2018-10-05T17:39:41,385][TRACE][o.e.i.b.request          ] [request] Adjusted breaker by [-16440] bytes, now [0]
[2018-10-05T17:39:41,385][TRACE][o.e.x.s.t.n.SecurityNetty4HttpServerTransport] [lonasdpricu2] channel closed: [id: 0x2add648b, L:/xx.xx.xx.xx:9200 ! R:/xx.xx.xx.xx:34900]

I have found out that this happens when my filebeat instance reads several dozens of files at startup with possibly big amount of data to be passed to Elastic.


(Mario Castro) #7

Uhmmm, "text is empty" message really confuses me but according to your description, filebeat backpressure system should handle potentially large files.

In any case, I'll try tweaking the input parameters in filebeat as described here https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-input-log.html

Specially haverster_limit and everything related with backoff

Can you also paste your exact filebeat configuration?


(Dmitriy Parkhonin) #8

The filebeat config:

filebeat.inputs:
- type: log
  enabled: false
  paths:
    - /var/log/*.log
filebeat.config.modules:
  path: ${path.config}/modules.d/*.yml
  reload.enabled: false
setup.template.settings:
  index.number_of_shards: 3
setup.kibana:
  host: "lonasdpricu2:5601"
output.elasticsearch:
  hosts: ["lonasdpricu2:9200"]

Plus module config:

module_version: "1.0"

var:
  - name: paths
    default:
      - D:\Logs\ice\uat-uk*\IpvNeo\IpvNeo.Server-*\SessionTraces-IpvNeo.Server-*.log
      - D:\Logs\ice\uat-uk*\SRCalibrator\SRCalibrator-*\SessionTraces-SRCalibrator-*.log 
      - D:\Logs\ice\uat-uk*\Marina\Marina-*\SessionTraces-Marina-*.log 
      - D:\Logs\ice\uat-uk*\Marina\MarinaGridNode-*\SessionTraces-MarinaGridNode-*.log 
      - D:\Logs\ice\prodcopy-uk1\Marina\Marina-*\SessionTraces-Marina-*.log 
      - D:\Logs\ice\prodcopy-uk1\Marina\MarinaGridNode-*\SessionTraces-MarinaGridNode-*.log 

input: config/dprice-messages.yml
ingest_pipeline: ingest/default.json

config/dprice-messages.yml:

type: log
enabled: true
paths:
{{ range $i, $path := .paths }}
 - {{$path}}
{{ end }}
exclude_files: [".gz$", ".zip"]
fields_under_root: true
fields:
  log_type: "messages"
tags: [
  "json"
]
json.keys_under_root: true
close_renamed: true
close_removed: true
close_timeout: 1m
scan_frequency: 15s
close_inactive: 1m
ignore_older: 120h

(Dmitriy Parkhonin) #9

@Mario_Castro, I suppose that the problem is in size of the messages. I try to handle the files with JSON records that may be of 100 MB in size. This big amount of data is produced in burst usually.
I have tried to increase the parameter max_bytes, but I cannot predict the exact number of monitored files (15-20 or even more) and the upper value for the parameter would be rather big.


(system) #10

This topic was automatically closed 28 days after the last reply. New replies are no longer allowed.